1
다음은 espncricinfo.com에서 직접 가져온 div 태그입니다.Beautifulsoup를 사용하여 웹 스크랩 4
<div id="rectPlyr_Playerlistt20" style="display: none; visibility: hidden;
background:url(http://i.imgci.com/espncricinfo/ciPlayerTablebottom-bg.gif) bottom left no-repeat;">
<table class="playersTable" cellpadding="0" cellspacing="0" style="margin-top:15px; margin-bottom:14px;">
<td class="divider"><a href="/ci/content/player/26421.html">R Ashwin</a></td>
<td class="divider"><a href="/ci/content/player/27223.html">STR Binny</a></td>
<td class=""><a href="/ci/content/player/625383.html">JJ Bumrah</a></td>
</tr>
<tr class="odd">
<td class="divider"><a href="/ci/content/player/430246.html">YS Chahal</a></td>
<td class="divider"><a href="/ci/content/player/290727.html">R Dhawan</a></td>
<td class=""><a href="/ci/content/player/28235.html">S Dhawan</a></td>
</tr>
<tr class="">
<td class="divider"><a href="/ci/content/player/28081.html">MS Dhoni</a></td>
<td class="divider"><a href="/ci/content/player/28671.html">FY Fazal</a></td>
<td class=""><a href="/ci/content/player/28763.html">G Gambhir</a></td>
</tr>
<tr class="odd">
<td class="divider"><a href="/ci/content/player/234675.html">RA Jadeja</a></td>
<td class="divider"><a href="/ci/content/player/290716.html">KM Jadhav</a></td>
<td class=""><a href="/ci/content/player/253802.html">V Kohli</a></td>
</tr>
<tr class="">
<td class="divider"><a href="/ci/content/player/277955.html">DS Kulkarni</a></td>
<td class="divider"><a href="/ci/content/player/326016.html">B Kumar</a></td>
<td class=""><a href="/ci/content/player/398506.html">Mandeep Singh</a></td>
</tr>
<tr class="odd">
<td class="divider"><a href="/ci/content/player/31107.html">A Mishra</a></td>
<td class="divider"><a href="/ci/content/player/481896.html">Mohammed Shami</a></td>
<td class=""><a href="/ci/content/player/290630.html">MK Pandey</a></td>
</tr>
<tr class="">
<td class="divider"><a href="/ci/content/player/554691.html">AR Patel</a></td>
<td class="divider"><a href="/ci/content/player/32540.html">CA Pujara</a></td>
<td class=""><a href="/ci/content/player/277916.html">AM Rahane</a></td>
</tr>
<tr class="odd">
<td class="divider"><a href="/ci/content/player/422108.html">KL Rahul</a></td>
<td class="divider"><a href="/ci/content/player/33141.html">AT Rayudu</a></td>
<td class=""><a href="/ci/content/player/279810.html">WP Saha</a></td>
</tr>
<tr class="">
<td class="divider"><a href="/ci/content/player/236779.html">I Sharma</a></td>
<td class="divider"><a href="/ci/content/player/34102.html">RG Sharma</a></td>
<td class=""><a href="/ci/content/player/537126.html">BB Sran</a></td>
</tr>
<tr class="odd">
<td class="divider"><a href="/ci/content/player/390484.html">JD Unadkat</a></td>
<td class="divider"><a href="/ci/content/player/237095.html">M Vijay</a></td>
<td class=""><a href="/ci/content/player/376116.html">UT Yadav</a></td>
</tr>
<tr class="">
</tr>
</table>
</div>
나는 HTML 파일 위에 긁어하려면 :
from bs4 import BeautifulSoup
import os
import urllib2
BASE_URL = "http://www.espncricinfo.com"
espn_ = urllib2.urlopen("http://www.espncricinfo.com/ci/content/player/index.html?country=6")
soup = BeautifulSoup(espn_ , 'html.parser')
#print soup.prettify().encode('utf-8')
t20 = soup.find_all('div' , {"id" : "rectPlyr_Playerlistt20"})
for row in t20:
print(row.find('tr' , {"class":"odd"}))
것은 우리가 내가 주어진 URL을 위의 코드를 촬영 한 가정하자. 내가 긁을 때 출력이 NONE이됩니다
t20을 인쇄해도 전체 출력이 나오지 않아도 JJ Bumrah 즉 첫 번째 <tr>
태그 만 표시됩니다. 위의 데이터로 명확하지 않은 경우 espn_에 제공된 URL로 이동하십시오. 팀 India를 선택하고 t20 탭으로 가십시오. 나는 t20 탭 아래에있는 모든 플레이어의 href 링크를 스크랩하고 싶습니다.