당신은이 같은 URL을 제공 뭔가에서 데이터를 가져 오기 위해 LXML 모듈을 사용할 수 있습니다
를
[{'close': '6/1', 'gross': '$70,165,972', 'open': '1/27', 'title': "Big Momma's House 2"}, {'close': '3/12', 'gross': '$62,318,875', 'open': '1/20', 'title': 'Underworld: Evolution'}, {'close': '2/16', 'gross': '$47,326,473', 'open': '1/6', 'title': 'Hostel'}, {'close': '5/4', 'gross': '$47,144,110', 'open': '1/27', 'title': 'Nanny McPhee'}, {'close': '5/11', 'gross': '$42,647,449', 'open': '1/13', 'title': 'Glory Road'}, {'close': '3/9', 'gross': '$38,399,961', 'open': '1/13', 'title': 'Last Holiday'}, {'close': '4/13', 'gross': '$17,127,992', 'open': '1/27', 'title': 'Annapolis'}, {'close': '3/30', 'gross': '$14,734,633', 'open': '1/13', 'title': 'Tristan and Isolde'}, {'close': '3/9', 'gross': '$11,967,000', 'open': '1/20', 'title': 'End of the Spear'}, {'close': '6/25', 'gross': '$10,407,978', 'open': '1/27', 'title': 'Roving Mars (IMAX)'}, {'close': '2/23', 'gross': '$6,090,172', 'open': '1/6', 'title': "Grandma's Boy"}, {'close': '1/22', 'gross': '$2,405,420', 'open': '1/6', 'title': 'BloodRayne'}, {'close': '4/6', 'gross': '$2,197,694', 'open': '1/27', 'title': 'Rang De Basanti'}, {'close': '5/18', 'gross': '$1,439,972', 'open': '1/20', 'title': 'Why We Fight'}, {'close': '5/4', 'gross': '$1,253,413', 'open': '1/27', 'title': 'Tristram Shandy: A Cock and Bull Story'}, {'close': '3/9', 'gross': '$888,975', 'open': '1/20', 'title': 'Looking for Comedy in the Muslim World'}, {'close': '3/23', 'gross': '$672,243', 'open': '1/27', 'title': 'Imagine Me and You'}, {'close': '1/29', 'gross': '$332,491', 'open': '1/13', 'title': 'Zinda'}, {'close': '3/9', 'gross': '$274,245', 'open': '1/20', 'title': 'Dirty'}, {'close': '5/4', 'gross': '$196,857', 'open': '1/6', 'title': 'Fateless'}, {'close': '2/23', 'gross': '$145,626', 'open': '1/27', 'title': 'Bubble'}, {'close': '3/23', 'gross': '$78,378', 'open': '1/27', 'title': 'Manderlay'}, {'close': '2/12', 'gross': '$65,429', 'open': '1/20', 'title': 'The Real Dirt on Farmer John'}, {'close': '2/26', 'gross': '$55,398', 'open': '1/13', 'title': 'That Man: Peter Berlin'}, {'close': '8/24', 'gross': '$53,580', 'open': '1/27', 'title': 'La Petite Jerusalem'}, {'close': '2/2', 'gross': '$29,710', 'open': '1/13', 'title': 'Henri Cartier-Bresson: The Impassioned Eye'}, {'close': '4/6', 'gross': '$24,038', 'open': '1/13', 'title': 'When the Sea Rises'}, {'close': '1/16', 'gross': '$20,055', 'open': '1/11', 'title': 'State of Fear'}, {'close': '4/9', 'gross': '$17,341', 'open': '1/13', 'title': 'Film Geek'}, {'close': '3/30', 'gross': '$16,377', 'open': '1/13', 'title': "April's Shower"}, {'close': '1/29', 'gross': '$11,290', 'open': '1/27', 'title': 'Live Freaky! Die Freaky!'}, {'close': '1/26', 'gross': '$5,716', 'open': '1/20', 'title': 'Pizza'}]
당신은 설명서를 참조 할 수 있습니다
import requests
from lxml import html
url = "http://www.boxofficemojo.com/monthly/?view=releasedate&chart=&month=1&yr=2006"
response = requests.get(url)
soup = html.fromstring(response.content)
result_list = []
for row in soup.xpath('//div[@id="body"]/center/table')[0].xpath('.//tr')[2:] :
# print row.xpath()
data = row.xpath('./td//text()')
print data
if len(data) >= 8 :
print data
result_list.append({'title' : data[1].strip(), 'gross' : data[3].strip(),
'open' : data[7].strip(), 'close' : data[8].strip()})
print result_list
이 발생합니다 더 많은 이해를 위해 scraping 및 lxml입니다. 여기
이것은 내가 찾고있는 것입니다. 고맙습니다!! 나는 xpath 항목에 대한 문서를 살펴볼 것이다. – hklee93
그런데 결과 목록이 알파벳순으로 정렬되지 않는 방법이 있습니까? 내가 추가 한대로 그것을 원한다 .. – hklee93
코드에서 볼 수 있듯이 소스 페이지와 같은 순서로 정렬 논리가 사용되지 않는다. –