2014-11-18 2 views
0

나는이 웹 사이트를 긁어 내려고하고 있지만 테이블의 내용 만 인쇄하려고 할 때 오류가 계속 발생합니다.파이썬 : html 테이블 내용

soup = BeautifulSoup(urllib2.urlopen('http://clinicaltrials.gov/show/NCT01718158 
').read()) 

print soup('table')[6].prettify() 


for row in soup('table')[6].findAll('tr'): 
    tds = row('td') 
    print tds[0].string,tds[1].string 

IndexError        Traceback (most recent call last) 
<ipython-input-70-da84e74ab3b1> in <module>() 
    1 for row in soup('table')[6].findAll('tr'): 
    2  tds = row('td') 
    3  print tds[0].string,tds[1].string 
    4 

IndexError: list index out of range 

답변

1

테이블은 <th> 헤더 요소보다는 <td> 세포, 헤더 행을 갖는다. 코드에서는 각 행에 항상 <td> 개의 요소가 있고 첫 번째 행에는 실패한 것으로 가정합니다.

당신은 충분하지 <td> 요소 행을 건너 뛸 수

:

for row in soup('table')[6].findAll('tr'): 
    tds = row('td') 
    if len(tds) < 2: 
     continue 
    print tds[0].string, tds[1].string 

하는 당신이 얻는 점 출력 :

>>> for row in soup('table')[6].findAll('tr'): 
...  tds = row('td') 
...  if len(tds) < 2: 
...   continue 
...  print tds[0].string, tds[1].string 
... 
Responsible Party: Bristol-Myers Squibb 
ClinicalTrials.gov Identifier: None 
Other Study ID Numbers: AI452-021, 2011‐005409‐65 
Study First Received: October 29, 2012 
Last Updated: November 7, 2014 
Health Authority: None 

마지막 행은 <br/> 요소 산재 텍스트를 포함; element.strings 생성기를 사용하여 모든 문자열을 추출하고이를 개행 문자에 결합 할 수 있습니다. 그래도 각 문자열을 제거하겠습니까 :

>>> for row in soup('table')[6].findAll('tr'): 
...  tds = row('td') 
...  if len(tds) < 2: 
...   continue 
...  print tds[0].string, '\n'.join(filter(unicode.strip, tds[1].strings)) 
... 
Responsible Party: Bristol-Myers Squibb 
ClinicalTrials.gov Identifier: NCT01718158 
History of Changes 
Other Study ID Numbers: AI452-021, 2011‐005409‐65 
Study First Received: October 29, 2012 
Last Updated: November 7, 2014 
Health Authority: United States: Institutional Review Board 
United States: Food and Drug Administration 
Argentina: Administracion Nacional de Medicamentos, Alimentos y Tecnologia Medica 
France: Afssaps - Agence française de sécurité sanitaire des produits de santé (Saint-Denis) 
Germany: Federal Institute for Drugs and Medical Devices 
Germany: Ministry of Health 
Israel: Israeli Health Ministry Pharmaceutical Administration 
Israel: Ministry of Health 
Italy: Ministry of Health 
Italy: National Bioethics Committee 
Italy: National Institute of Health 
Italy: National Monitoring Centre for Clinical Trials - Ministry of Health 
Italy: The Italian Medicines Agency 
Japan: Pharmaceuticals and Medical Devices Agency 
Japan: Ministry of Health, Labor and Welfare 
Korea: Food and Drug Administration 
Poland: National Institute of Medicines 
Poland: Ministry of Health 
Poland: Ministry of Science and Higher Education 
Poland: Office for Registration of Medicinal Products, Medical Devices and Biocidal Products 
Russia: FSI Scientific Center of Expertise of Medical Application 
Russia: Ethics Committee 
Russia: Ministry of Health of the Russian Federation 
Spain: Spanish Agency of Medicines 
Taiwan: Department of Health 
Taiwan: National Bureau of Controlled Drugs 
United Kingdom: Medicines and Healthcare Products Regulatory Agency 
+0

MARTIJN! 당신은 작은 "g"를 가진 "신"입니다. 대단히 감사합니다. –

+0

Martijn, 이전에 표의 내용을 보려면 어떻게해야합니까? 'Show 77 Study Locations'페이지에서 'http://clinicaltrials.gov/show/NCT01718158'을 선택하십시오. 그것은 내가 왜 내가 그것을 찾을 수없는 이유를 알고 테이블입니다 : 테이블 [변수]. 감사. –