이미 인쇄 나는 다음과 같은 데이터 구조의 HTML 파일이 파이썬

에 HTML 데이터를 해석하는 방법 : 파이어 폭스에서 볼 때이미 인쇄 나는 다음과 같은 데이터 구조의 HTML 파일이 파이썬

<tr> 
    <td valign="top"><img src="img.jpg"></td> 
    <td><a href="file.zip">file.zip</a></td> 
    <td align="right">24-Apr-2013 12:42 </td> 
    <td align="right">200K</td> 
</tr> 
...

그것은 기본적으로 간단한 테이블과이처럼 보이는 :

file.zip 22-Apr-2013 12:42 200K

을

이 세 가지 값 (파일 이름, 날짜, 크기)을 추출하고 싶습니다. 예를 들어 split()와 함께하지만, "html로 해석 된 양식"을 파이썬에서 인쇄 할 수 있는지 궁금합니다.

import xyz 
print xyz.htmlinterpreted(htmlfile.html) 
>>> file.zip 22-Apr-2013 12:42 200K

그렇게하면 split(" ")으로 데이터를 쉽게 분할 할 수 있습니다. 파이썬에서 가능합니까?

출처

2013-04-24 Johnny

HTML 파서를 사용하십시오. BeautifulSoup는 이것을 breaze 수 :

from bs4 import BeautifulSoup 

soup = BeautifulSoup(html_source) 
print list(soup.stripped_strings)

데모 :

>>> from bs4 import BeautifulSoup                         >>> soup = BeautifulSoup('''<tr><td valign="top"><img src="img.jpg"></td><td><a href="file.zip">file.zip</a></td><td align="right">24-Apr-2013 12:42 </td><td align="right">200K</td></tr>''') 
>>> print list(soup.stripped_strings) 
[u'file.zip', u'24-Apr-2013 12:42', u'200K']

출처

2013-04-24 18:26:20

는 초 그것에 나를 이길. – ecline6

이미 인쇄 나는 다음과 같은 데이터 구조의 HTML 파일이 파이썬

답변

관련 문제