2016-07-19 3 views
2

웹 스크래핑을 배우려고합니다. 이 페이지의 모든 URL을 가져와야합니다 (http://www.99acres.com/rent-property-in-chennai-ffid)?요청시 Xpath 요청시 빈 목록 반환

먼저 내 코드에서 getresults_ajax POST 요청을 복제하는 최신 항목 순으로 정렬해야합니다. Chrome 콘솔의 xpath가 유효한 결과를 반환하더라도 코드에 빈 목록이 표시됩니다.

복제 요청이 지루할 수 있으며 동적 페이지를 스크래핑하기 위해 PhantomJS와 함께 Selenium을 사용하지만 내용을 정렬 한 다음 응답에서 데이터를 가져와 까다로운 것으로보아야합니다.

내 코드 :

d = { 
    'src': 'SORTING_date_d', 
    'static_search': 'true',  
    '': 'undefined', 
    'sortby': 'date_d', 
    'lstAcnId': '8930791340597402', 
    'encrypted_input': 'UiB8IFFTIHwgUiB8IzIjICB8IGNoZW5uYWkgIzMjfCAgfCBDUDMyIzIyIyB8IDI1MTU3NTg2IHwgIHwgMzIgfCM1IyAgfCBSICM0MCN8ICA=', 
    'lstAcn': 'SEARCH', 
    'is_ajax': '1' 
} 

h = { 
    'Referrer': 'http://www.99acres.com/rent-property-in-chennai-ffid?orig_property_type=R&search_type=QS&search_location=CP32&pageid=QS&keyword_orig=chennai' 
} 

req = requests.post(url = 'http://www.99acres.com/do/quicksearch/getresults_ajax', data = d, headers = h) 
r = html.fromstring(req.text) 

#print('test 1' + str(req.text)) 

prices = r.xpath('//div[@title = "View property details"]') 

print('test %d' % len(prices)) 
# driver = webdriver.PhantomJS(executable_path = R'C:\Python27\selenium\webdriver\phantomjs-2.1.1-windows\bin\phantomjs.exe') 

for price in prices: 
    print('price is this ' + str(price)) 

답변

1

텍스트를 인쇄 할 경우 당신이 그것을 볼 수는 JSON 응답입니다 : 그래서

{"html_ysf":" <div class=\"srp-ysfWrap boxSize\">\n\n\n\n  <diV. etc............. 

방금 ​​사용하여 흥미로운 HTML을 추출 원하는 것을 얻을 수있는 HTML2 키 :

req = requests.post(url='http://www.99acres.com/do/quicksearch/getresults_ajax', data=d, headers=h) 
r = html.fromstring((req.json()["html2"])) 
prices = r.xpath('//div[@title = "View property details"]') 
print('test %d' % len(prices)) 
for price in prices: 

     print('price is this ' + str(price)) 

각 가격은 div 요소입니다. 따라서 실행

b'<div data-propid="Q26021619" data-pgid="QS" class="srpWrap " title="View property details" data-fsl="N">\n\t\t<input id="ajxPDFlg" type="hidden" value="najx">\n  <input id="dataSRPCLKTRK" type="hidden" value="ON">\n  <i class="uiIcon pLatinum"></i>\t\t<div class="wrapttl">\n\t\t\t<div class="_srpttl srpttl fwn wdthFix480 lf">\n    <b class="WebRupee f14 mr5"> &#8377;</b>    <b id="rs_Q26021619">18,000</b>\n    <a data-proppos="\'\'" id="desc_Q26021619" class="b wWrap" target="_blank" title="2 BHK, Residential Apartment for rent in Choolaimedu" href="/2-bhk-bedroom-apartment-flat-for-rent-in-choolaimedu-chennai-central-1000-sq-ft-spid-Q26021619" data-fsl="N">2 BHK, Residential Apartment for rent in Choolaimedu</a>   </div>\n   <i class="uline" data-maplatlngzm="13.06709,80.2195432,11" data-iwdesc=" Residential Apartment for rent in Choolaimedu" data-ttlurl="http://www.99acres.com/2-bhk-bedroom-apartment-flat-for-rent-in-choolaimedu-chennai-central-1000-sq-ft-spid-Q26021619" data-price="18,000," data-area="Super built-up ,1000,Sq.Ft." data-bedrm="2" data-bldname="On Request" title="View Map"><i class="uiIcon imap"></i><i class="ml_5 f13 vmid hverU">Map</i></i>   <div class="clr"></div>\n\t\t</div>\n  \n    \n\t\t<div class="srpDetail">\n\t\t\t<div class="srpImg rel">\n    <img class="imgBoxSrp lazy" alt="2 BHK, Residential Apartment for rent in Choolaimedu" width="208" height="150" data-original="http://static.99acres.com/images/srpimages/noproperty-new.png" src="http://static.99acres.com/images/i0.gif"><div class="imgCap" data-clk-json=\'{"sno":-1,"ids":"0;732;","phType":"PROP","index":0,"text":"Sri Sakthi Real Estate","classLabel":"Dealer","profileId":"1122559","bedroomNum":"2","src":"SRP"}\'><a class="trackVamRos" vamacttype="Locality_Video_Count" vamactsrc="RENT_SRP" data-trkctgry="CLICK_LOCALITY_VIDEO_LINK" data-blid="732" href="#" data-clk-json=\'{"vtag":"LOC","sno":-1,"tab":4,"ids":"0;732;","phType":"PROP","entity":"locimages","subtab":"LVIDEO","text":"Sri Sakthi Real Estate","classLabel":"Dealer","profileId":"1122559","bedroomNum":"2","src":"SRP"}\'>1 Locality Video</a><div class="clr"></div></div>\t\t\t</div>\n\t\t\t<div class="srpDataWrap"><span>Super built-up Area : <b>1000 Sq.Ft. </b></span><div class="clr pdt8"></div><span class="doElip">Society : <bclass>On Request</bclass></span><div class="sep clr mt3imp"></div><span><span>Highlights:&#160; </span> <span>On Rent&#160;</span><span> <span>/&#160;</span> 1 to 5 years old&#160;</span><span> <span>/&#160;</span> Unfurnished&#160;</span><span> <span>/&#160;</span> 2nd Floor (out of 3)&#160;</span></span><div class="sep clr"></div>\t\t\t\t<div class="lf f12 wBr">\n\t\t\t\t\t<b>Description :</b> \n     Near gandhi road\nGood locality, Calm atmosphere\nCall for more details\t\t\t\t</div>\n                 <div class="rel clr">\n      <div class="lf mt13 mr13">Features: </div>\n      <div class="iconDiv fc_icons fcInit" attr="4,5,24,">\n      <i class="i4" value="Reserved Parking">&#160;</i><i class="i5" value="Feng Shui/Vaastu Compliant">&#160;</i><i class="i24" value="Water Storage">&#160;</i>      </div>\n       \n      <div class="LyrIcon clkEvntStp top0imp"></div>\n    </div>\n     \t\t\t</div>\n   <div class="clr p5"></div>\n   <div class="lf f13 hm10 mb5">Dealer : <a data-pid="1122559" class="hverU blkImp srpTplTrck" title="Sri Sakthi Real Estate , Chennai Central" target="_blank" href="/sri-sakthi-real-estate-chennai-central-drid-1122559">Sri Sakthi Real Estate</a>       &#160;&#160;&#160;&#160;Posted : Today      \n     </div> \n    \t\t</div>\n  <div class="clr"></div>\n   <div data-srptrk="ntrck" class="srpAction m10 mt5">\n  \t\t<a data-mxid="" data-apid="1122559" data-mc="N" data-rc="R" data-cl="Dealer" data-pgid="QS" href="javascript:void(0);" class="srpBlue f13 mr10 lf cntClk" title="Send E-mail &amp; SMS"> Contact Dealer <i>FREE</i></a><a data-pgid="QS" data-src="listing rank" data-lst="P" data-sms="RGVhciBBRERfQlVZRVJOQU1FX0hFUkUsIHlvdSBtYXkgY29udGFjdCBCYWJ1IGF0ICs5MS05Nzg5MDc0NzQxIGZvciBJTlIgMTggSyAxMDAwIFNxLiBGdC4gRmxhdCBpbiBDaG9vbGFpbWVkdS4=" data-trksrc="listing rank" data-ttc="" href="javascript:void(0);" class="srpWhite f13 mr10 lf vpn" id="viewphnoQ26021619" title="View Phone Number">View Phone Number</a><div data-src="listing rank" id="prop_Q26021619" class="sl_container blkImp f15 lf mt5 mr10"><span class="sl_star_empty_container" title="Shortlist this property"><i class="lf uiIcon sl_star_empty"></i><span class="lf m5">Shortlist</span></span></div>\t <div class="lf mt5 rptLtng" data-cl="A" data-md="R" data-pid="1122559" data-proptype="1" data-photocount="0" data-rescom="R">\n\t\t<div class="row dwnSrp"> \n\t\t<i class="spdpIcn repot_acu"></i> \n \t\t<a class="f13 b delCh blLink">Report problem with listing</a>\n\t </div>\n\t </div>\n      </div>\n    <div class="abs verifyLbl ViconPosSrp">\n   <div id="tooltipSociety" class="infoTip2 fwn f13 ital r5 hide VlyrPosSrp">\n    Learn about our verification process <a id="verify_process_info" class="blLink uLine" href="javascript:void(0)" style="text-decoration:underline">here</a>.\n     <i class="ver-arrow-down abs" style="left: 80px; bottom: -12px;"></i>\n   </div>\n   <i class="uiIcon verified mt8"></i>\n  </div>\n  \t\t<div class="clr pdt10"></div>\n </div>  \n\n' 
b'<div data-propid="X22163381" data-pgid="QS" class="srpWrap " title="View property details" data-fsl="N">\n\t\t<input id="ajxPDFlg" type="hidden" value="najx">\n  <input id="dataSRPCLKTRK" type="hidden" value="ON">\n  <i class="uiIcon pLatinum"></i>\t\t<div class="wrapttl">\n\t\t\t<div class="_srpttl srpttl fwn wdthFix480 lf">\n    <b class="WebRupee f14 mr5"> &#8377;</b>    <b id="rs_X22163381">22,000</b>\n    <a data-proppos="\'\'" id="desc_X22163381" class="b wWrap" target="_blank" title="2 BHK, Residential Apartment for rent in Choolaimedu" href="/2-bhk-bedroom-apartment-flat-for-rent-in-choolaimedu-chennai-central-1000-sq-ft-r2-spid-X22163381" data-fsl="N">2 BHK, Residential Apartment for rent in Choolaimedu</a>   </div>\n   <i class="uline" data-maplatlngzm="13.0673818,80.2213615,11" data-iwdesc=" Residential Apartment for rent in Choolaimedu" data-ttlurl="http://www.99acres.com/2-bhk-bedroom-apartment-flat-for-rent-in-choolaimedu-chennai-central-1000-sq-ft-r2-spid-X22163381" data-price="22,000, @ &lt;span class=WebRupee&gt;&#8377; &lt;/span&gt;22/ Sq.Ft." data-area="Built-up ,1000,Sq.Ft." data-bedrm="2" data-bldname="On Request" title="View Map"><i class="uiIcon imap"></i><i class="ml_5 f13 vmid hverU">Map</i></i>   <div class="clr"></div>\n\t\t</div>\n  \n    \n\t\t<div class="srpDetail">\n\t\t\t<div class="srpImg rel">\n    <img class="imgBoxSrp lazy" alt="2 BHK, Residential Apartment for rent in Choolaimedu" width="208" height="150" data-original="http://static.99acres.com/images/srpimages/noproperty-new.png" src="http://static.99acres.com/images/i0.gif"><div class="imgCap" data-clk-json=\'{"sno":-1,"ids":"0;732;","phType":"PROP","index":0,"text":"Sri Sakthi Real Estate","classLabel":"Dealer","profileId":"1122559","bedroomNum":"2","src":"SRP"}\'><a class="trackVamRos" vamacttype="Locality_Video_Count" vamactsrc="RENT_SRP" data-trkctgry="CLICK_LOCALITY_VIDEO_LINK" data-blid="732" href="#" data-clk-json=\'{"vtag":"LOC","sno":-1,"tab":4,"ids":"0;732;","phType":"PROP","entity":"locimages","subtab":"LVIDEO","text":"Sri Sakthi Real Estate","classLabel":"Dealer","profileId":"1122559","bedroomNum":"2","src":"SRP"}\'>1 Locality Video</a><div class="clr"></div></div>\t\t\t</div>\n\t\t\t<div class="srpDataWrap"><span>Built-up Area : <b>1000 Sq.Ft. </b></span><div class="clr pdt8"></div><span class="doElip">Society : <bclass>On Request</bclass></span><div class="sep clr mt3imp"></div><span><span>Highlights:&#160; </span> <span>On Rent&#160;</span><span> <span>/&#160;</span> 1 to 5 years old&#160;</span><span> <span>/&#160;</span> Furnished&#160;</span><span> <span>/&#160;</span> 1st Floor (out of 4)&#160;</span></span><div class="sep clr"></div>\t\t\t\t<div class="lf f12 wBr">\n\t\t\t\t\t<b>Description :</b> \n     2bhk house on rent in choolaimedu , Gill nagar area with all nessesary facilties.\t\t\t\t</div>\n             \t\t\t</div>\n   <div class="clr p5"></div>\n   <div class="lf f13 hm10 mb5">Dealer : <a data-pid="1122559" class="hverU blkImp srpTplTrck" title="Sri Sakthi Real Estate , Chennai Central" target="_blank" href="/sri-sakthi-real-estate-chennai-central-drid-1122559">Sri Sakthi Real Estate</a>       &#160;&#160;&#160;&#160;Posted : Today      \n     </div> \n    \t\t</div>\n  <div class="clr"></div>\n   <div data-srptrk="ntrck" class="srpAction m10 mt5">\n  \t\t<a data-mxid="" data-apid="1122559" data-mc="N" data-rc="R" data-cl="Dealer" data-pgid="QS" href="javascript:void(0);" class="srpBlue f13 mr10 lf cntClk" title="Send E-mail &amp; SMS"> Contact Dealer <i>FREE</i></a><a data-pgid="QS" data-src="listing rank" data-lst="P" data-sms="RGVhciBBRERfQlVZRVJOQU1FX0hFUkUsIHlvdSBtYXkgY29udGFjdCBCYWJ1IGF0ICs5MS05Nzg5MDc0NzQxIGZvciBJTlIgMjIgSyAxMDAwIFNxLiBGdC4gRmxhdCBpbiBDaG9vbGFpbWVkdS4=" data-trksrc="listing rank" data-ttc="" href="javascript:void(0);" class="srpWhite f13 mr10 lf vpn" id="viewphnoX22163381" title="View Phone Number">View Phone Number</a><div data-src="listing rank" id="prop_X22163381" class="sl_container blkImp f15 lf mt5 mr10"><span class="sl_star_empty_container" title="Shortlist this property"><i class="lf uiIcon sl_star_empty"></i><span class="lf m5">Shortlist</span></span></div>\t <div class="lf mt5 rptLtng" data-cl="A" data-md="R" data-pid="1122559" data-proptype="1" data-photocount="0" data-rescom="R">\n\t\t<div class="row dwnSrp"> \n\t\t<i class="spdpIcn repot_acu"></i> \n \t\t<a class="f13 b delCh blLink">Report problem with listing</a>\n\t </div>\n\t </div>\n      </div>\n    <div class="abs verifyLbl ViconPosSrp">\n   <div id="tooltipSociety" class="infoTip2 fwn f13 ital r5 hide VlyrPosSrp">\n    Learn about our verification process <a id="verify_process_info" class="blLink uLine" href="javascript:void(0)" style="text-decoration:underline">here</a>.\n     <i class="ver-arrow-down abs" style="left: 80px; bottom: -12px;"></i>\n   </div>\n   <i class="uiIcon verified mt8"></i>\n  </div>\n  \t\t<div class="clr pdt10"></div>\n </div>  \n\n' 

그래서 당신이 필요 요소에서 추출 할 원하는 :

for price in prices: 
     print(html.tostring(price)) 

우리는 같은 출력을 얻을.