저는 파이썬에서 초보자입니다. 아마존 페이지에서 제품 링크를 스크랩하고 싶습니다. 예를 들어,이 페이지 http://www.amazon.com/s/ref=sr_in_-2_p_4_18?me=A3MZ96G5C78IVQ&fst=as%3Aoff&rh=p_4%3AFunKo&ie=UTF8&qid=1477811368를 스크랩하고 싶은 난 그냥 출력으로 제품의 제목 링크를 원하는 파이썬Python을 사용하여 아마존 제품 링크를 추출하는 방법

from bs4 import BeautifulSoup 
import requests 
url = "http://www.amazon.com/s/ref=sr_in_-2_p_4_18?me=A3MZ96G5C78IVQ&fst=as%3Aoff&rh=p_4%3AFunKo&ie=UTF8&qid=1477811368" 
r = requests.get(url) 
soup = BeautifulSoup(r.content, "lxml") 

file = open("parseddata.txt", "wb") 

links = soup.find_all('a', {'class': 'a-link-normal s-access-detail-page a-text-normal'}) 

for link in links: 
print(link.get('href')) 
file.write(href + '\n') 
file.close()

에이 코드를 사용합니다. 아무도 내가 잘못하고 있다고 말할 수 있습니까?

출처

2016-10-30 Gurpreet Singh

당신은 당신의 코드가 무엇을 기대하고 실제로 무엇을합니까? 오류 메시지 나 경고가 나타 납니까? 결과가 잘못 되었습니까? 그렇다면 어떤 점에서 문제가 있습니까? –

@Gurpeet Singh이 일을해서는 안된다. (심각한 일이라면) 아마존에 개발자 용 API가 있다는 것을 알기를 바란다. – danidee

header 요청에 user-agent을 추가하여 사용자가 로봇이 아닌 것처럼 행동하십시오.

from bs4 import BeautifulSoup 
import requests 
url = "http://www.amazon.com/s/ref=sr_in_-2_p_4_18?me=A3MZ96G5C78IVQ&fst=as%3Aoff&rh=p_4%3AFunKo&ie=UTF8&qid=1477811368" 

# add header 
headers = { 
    'User-Agent': 'Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.71 Safari/537.36' 
} 
r = requests.get(url, headers=headers) 
soup = BeautifulSoup(r.content, "lxml") 

file = open(r"parseddata.txt", "w") 

links = soup.find_all('a', {'class': 'a-link-normal s-access-detail-page a-text-normal'}) 

for link in links: 
    print(link.get('href')) 
    file.write(link.get('href')+ '\n') 
file.close()

결과

https://www.amazon.com/Funko-POP-Marvel-Dancing-Bobble/dp/B00N1EJXUU/ref=sr_1_1/160-5408618-6684940?m=A3MZ96G5C78IVQ&s=merchant-items&ie=UTF8&qid=1477822032&sr=1-1&refinements=p_4%3AFunKo 
https://www.amazon.com/Funko-POP-Movies-Potter-Action/dp/B019JIA4IQ/ref=sr_1_2/160-5408618-6684940?m=A3MZ96G5C78IVQ&s=merchant-items&ie=UTF8&qid=1477822032&sr=1-2&refinements=p_4%3AFunKo 
https://www.amazon.com/FunKo-2390-Funko-Darth-Maul/dp/B005F1QBMK/ref=sr_1_3/160-5408618-6684940?m=A3MZ96G5C78IVQ&s=merchant-items&ie=UTF8&qid=1477822032&sr=1-3&refinements=p_4%3AFunKo 
........

출처

2016-10-30 10:10:48 Aaron

@Aaron, 고맙습니다. –

Python을 사용하여 아마존 제품 링크를 추출하는 방법

답변

결과

관련 문제