file_get_contents 및 pageless jQuery

PHP의 file_get_contents 함수를 사용하여 특정 도메인에서 생성 된 모든 핀을 보여주는 pinterest 소스 추적 페이지에서 HTML을 검색합니다. 예 : http://pinterest.com/source/google.com/file_get_contents 및 pageless jQuery

그러나 pinterest는 jQuery pageless 기능을 사용하여 모든 콘텐츠가로드되는 것을 방지합니다.

전체 결과 집합이 반환되도록 file_get_contents 함수가 페이지 비 저장 함수를 강제 실행하는 방법이 있습니까?

출처

2012-03-12 Andrew

시도한 file_get_contents,하지만 did'nt는 나에게 많은 것을 제공하지만 굳은 나를 위해 잘 작동하는 것처럼 보입니다.

당신은 당신의 서버에 설치 컬이 필요하고, PHP에 대한 libcurl에 확장,하지만 당신은 이런 식으로 뭔가를 시도하고 당신이 무엇을 얻을 볼 수있는 것 :

이 경우

<?php 
    $cl = curl_init(); 
    $header[0] = "Accept: text/xml,application/xml,application/xhtml+xml,"; 
    $header[0] .= "text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5"; 
    $header[] = "Cache-Control: max-age=0"; 
    $header[] = "Connection: keep-alive"; 
    $header[] = "Keep-Alive: 300"; 
    $header[] = "Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.3"; 
    $header[] = "Accept-Language: nb-NO,nb;q=0.8,no;q=0.6,nn;q=0.4,en-US;q=0.2,en;q=0.2"; 
    $header[] = "Pragma: "; 

    curl_setopt($cl, CURLOPT_FAILONERROR,true); 
    curl_setopt($cl, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/535.7 (KHTML, like Gecko) Chrome/16.0.912.77 Safari/535.7'); 
    curl_setopt($cl, CURLOPT_HTTPHEADER, $header); 
    curl_setopt($cl, CURLOPT_REFERER, 'http://www.google.com'); 
    curl_setopt($cl, CURLOPT_ENCODING, 'gzip,deflate'); 
    curl_setopt($cl, CURLOPT_AUTOREFERER, false); 
    curl_setopt($cl, CURLOPT_RETURNTRANSFER, 1); 
    curl_setopt($cl, CURLOPT_CONNECTTIMEOUT, 2); 

    $url = 'http://pinterest.com/source/google.com/'; 

    curl_setopt($cl, CURLOPT_URL, $url); 
    $output = curl_exec($cl); 
    curl_close($cl); 
?> 

<!DOCTYPE html> 
    <head> 
     <title>get pinterest</title> 
    </head> 
    <body> 
     <xmp> 
      <?php echo $output; ?> 
     </xmp> 
    </body> 
</html>

출처

2012-03-12 17:27:07 adeneo

file_get_contents(..)은 브라우저에서 페이지 소스로 보이는 것을 제공합니다. 그것은 자바 스크립트를 통해로드 얻을 물건을 줄 수 없습니다. 가장 좋은 방법은 페이지 소스에서 AJAX 호출을 찾는 것입니다. 또는 브라우저의 유틸리티를 열어 페이지 활동을 모니터링 할 수도 있습니다. (크롬에서는 ctrl + shift + J를 사용하여 가져옵니다)

요청이 작성된 URL을 얻은 후에는 file_get_contents(..)에서 직접 해당 데이터를 가져 와서 관련 데이터를 가져올 수 있습니다.

출처

2012-03-12 17:11:39 SuperSaiyan

, 그것은되지 않습니다 당신은 전체 세션을 시뮬레이션하고 쿠키와 모든 것을 추적 할 필요가 있습니다. – miki

어렵지 않아야합니다. 첫 번째 요청에서 얻은 쿠키를 저장하고 추가 요청을 위해 쿠키를 보냅니다. 서버가 원한다면 헤더를 요청 전에 추가하여 '브라우저와 유사하게'만들 수 있습니다. – SuperSaiyan

file_get_contents 및 pageless jQuery

답변

관련 문제