BeautifulSoup4 (python 3.4)를 사용하여 모든 HTML 태그 제거

저는이 문제를 한동안 해결하려고 노력했지만이 작업을 수행하는 유일한 방법은 복잡한 while 루프를 사용하는 것입니다.BeautifulSoup4 (python 3.4)를 사용하여 모든 HTML 태그 제거

"<td colspan='2' class='ToEx'>This is a test (<i> to see </i> this works) and I really hope it does</td>"

및 출력이 :

나는 다음과 같은 입력하려는 본질적으로

"This is a test (to see if this works) and I really hope it does"

, 나는 에 "<>"와 모든과 어떤 사이 인을 제거 할 수 있습니다.

"This is a test (<i> to see </i> this works) and I really hope it does"

그러나 나는이 짜증나는 사람들과 함께 왼쪽 해요 : : 나는 몇 가지 명령을 할 수있는 최선은

from bs4 import BeautifulSoup 

text = "<td colspan='2' class='ToEx'>This is a test (<i> to see </i> this works) and I really hope it does</td>" 
soup = BeautifulSoup(text) 
content = soup.find_all("td","ToEx") 
content[0].renderContents()

출처

2014-07-06 RRR

는 그냥 .text을 인쇄 : 여기

<i></i> 내 코드입니다 태그의 속성은, 그것의 텍스트 당신에게

print(content[0].text)

출력을 제공 :

This is a test (to see this works) and I really hope it does

출처

2014-07-06 06:34:45

먼저 시도했는데 오류가 발생하여 다른 작업이 진행되고있는 것 같습니다. 좀 더 고마워 할께 고마워. – RRR

@ user3757519 이걸 실행할 때 무슨 오류가 있니? –

나는 get_text()을 사용 -이 상황이 이런 종류의 위해 설계되었습니다 :

text = "<td colspan='2' class='ToEx'>This is a test (<i> to see </i> this works) and I really hope it does</td>" 
soup = BeautifulSoup(text) 
print(soup.get_text())

이 as per the documentation를 작동합니다.

이

text="<td colspan='2' class='ToEx'>This is a test (<i> to see </i> this works) and I really hope it does</td>" 
soup = BeautifulSoup(text) 

for string in soup.strings: 
    print(str(string),end="")

모두의 뜻 출력 :

이것은 당신이 그 할 일을 사용하려는 경우 - 나는 .text 대신 아름다운 수프 4, .string를 사용, 이전에 사용 본 적이 없다

테스트 (이 작품을 보러) 그리고 난 정말 희망 않습니다

모두 똑같이 작동하지만, get_text() 쉽게 될 것입니다 특히 변수에 텍스트를 저장하려는 경우에 사용하십시오.

출처

2014-07-24 10:36:27 TheDarkTurtle

BeautifulSoup4 (python 3.4)를 사용하여 모든 HTML 태그 제거

답변

관련 문제