BeautifulSoup 사용

나는 BeautifulSoup를 사용하여 텍스트 크롤러를 만들고 있습니다. 그러나이 코드를 실행하면 오류 코드가 표시됩니다.BeautifulSoup 사용

Traceback (most recent call last): 
    File "D:\Python27\Crawling.py", line 33, in <module> 
    text = content.get_text() 
AttributeError: 'NoneType' object has no attribute 'get_text'

해결 방법을 알려 주시면 대단히 감사하겠습니다.

import urllib 
from bs4 import BeautifulSoup 
import xml.dom.minidom 

keyWord = raw_input("Enter the key-word : ") 
#Enter my Search KeyWord 

address = "http://openapi.naver.com/search?key=8d4b5b7fef7a607863013302754262a3&query="     + keyWord + "&display=5&start=1&target=kin&sort=sim" 

search_result = urllib.urlopen(address) 
raw_data = search_result.read() 
parsed_result = xml.dom.minidom.parseString(raw_data) 
links = parsed_result.getElementsByTagName('link') 

source_URL = links[3].firstChild.nodeValue 
#The number 3 has no meaning, it has 0 to 9 and I just chose 3 
page = urllib.urlopen(source_URL).read() 

#save as html file 
g = open(keyWord + '.html', 'w') 
g.write(page) 
g.close() 

#open html file 
g = open(keyWord + '.html', 'r') 
bs = BeautifulSoup(g) 
g.close() 


content = bs.find(id="end_content") 
text = content.get_text() 

#save as text file 
h = codecs.open(keyWord + '.txt', 'w', 'utf-8') 
h.write(keyWord + ' ') 
h.write(text) 

print "file created"

출처

2014-03-31 user3473222

그 오류는 간단하다 : 발견()가 아무것도 찾을 수없는 경우 없음 반환'content' 블록을 반환 * nothing *, 그래서 당신은'get_text'를 사용할 수 없습니다. – Manhattan

감사합니다, 신의 축복이 – user3473222

오류로 인해 문제가 발생합니다.

당신의 수프 트리거되고

content = bs.find(id="end_content")

, bs는 id="end_content" 아무 요소가 없습니다 : 잘못된 라인에서 온다. BeautifulSoup에서 요소를 찾을 수 없으면 오류가 발생하지 않지만 단순히 None을 반환합니다. 원본 HTML을 살펴보고 id가 올바른지 다시 확인하십시오.

URL 구문 분석을 처리하려면 모듈 requests을 살펴 보는 것이 좋습니다. 단순히 문자열을 연결하는 것보다 훨씬 강력합니다.

출처

2014-03-31 13:52:50 Hooked

고마워, 매우 도움이되었다. – user3473222

네,'요청'은 멋진 라이브러리입니다. 완전히 권장됩니다. – franzlorenzon

@ user3473222 문제가 없으며 스택 오버플로를 환영합니다!당신은 우리 각자에게 감사 할 필요가 없습니다. 일단 충분한 평판을 얻으면 _good_ 질문과 _good_ 답변을 upvote 할 수 있습니다. 최선의 답변을 수락 할 수 있습니다. 그렇게하십시오. 문제의 해결 방법을 알 수 있습니다. – Hooked

class="end_content" 인 div가 있지만 문제는 id="end_content" 인 요소가 없다는 것입니다.

교체 :와

content = bs.find(id="end_content")

을 (당신이 class이 예약 된 파이썬 키워드이기 때문에 여기 class_를 사용할 필요가 있습니다) : 또는

content = bs.find("div", class_="end_content")

또는 :

content = bs.find("div", {"class": "end_content"})

또한 성능 및 명시적인 이유로 인해 -를 더 잘 지정하십시오.태그는 div이 될 예정이므로 여기에 태그를 추가하십시오.

출처

2014-03-31 13:55:33 alecxe

@Hooked와 @alecxe의 답변을 모두 고려하여 requests을 사용하는 방법은 다음과 같습니다. 검색 쿼리에 handbag 키워드를 사용할 예정입니다.

import requests as rq 
from bs4 import BeautifulSoup as bsoup 
from xml.dom.minidom import parseString 

url = "http://openapi.naver.com/search?key=8d4b5b7fef7a607863013302754262a3&query=handbag&display=100&start=1&target=kin&sort=sim" 
result = rq.get(url) 
parsed_result = parseString(result.content) 
links = parsed_result.getElementsByTagName("link") 

new_url = links[3].firstChild.nodeValue 
new_result = rq.get(new_url).content 

g = open("handbag.html", "w") 
g.write(new_result) 
g.close() 

g = open("handbag.html", "r") 
soup = bsoup(g) 
g.close() 

content = soup.find("div", class_="end_content") 
text = content.get_text() 

print text.encode("utf-8").strip()

.encode("utf-8") 부분은 한국어 문자 출력을 처리하기위한 것입니다. 결과는 다음과 같습니다.

아디다스 그래픽핸드백 
거의품절이던데............ 
어디파는데알수없을가요 ㅜ ㅜ ??!?!? 
[Finished in 4.7s]

이 정보가 도움이되는지 알려주십시오.

출처

2014-03-31 14:11:33 Manhattan

find_all()이 아무것도 찾을 수 없으면 빈 목록을 반환합니다.

당신이 None.get_text()와 같은 Beautiful Soup Documentation

코드에서이를 찾을 수

출처

2016-11-11 11:19:40

BeautifulSoup 사용

답변

관련 문제