2014-11-27 6 views
0

여기 거래가 있습니다. 사용자 입력 URL을 구문 분석하도록 설정 한 웹 크롤러가 있습니다. 지금까지 작업을 완료했으며 URL 원본을 출력 할 수 있습니다. 자, 이제 끝내야 할 것이 있습니다. 페이지에 포함 된 URL의 모든 제목을 표시해야합니다. 예를 들어 사용자가 nytimes.com을 구문 분석하려는 경우 봇은 페이지에 포함 된 모든 제목을 다른 URL로 이동하여 표시합니다. 등 "최고의 추수 감사절 조리법"처럼 Heres는 내 코드 :URL 제목을 표시하는 Python의 웹 크롤러

import urllib2 

website = raw_input('Enter the website url: ') 

getwebsite = urllib2.urlopen(website) 
readwebsite = getwebsite.read() 
print readwebsite 

답변

0

당신은 모든 링크 추출 BeautifulSoup를 사용할 수 있습니다

import urllib2 
from bs4 import BeautifulSoup 

website = raw_input('Enter the website url: ') 
get_website = urllib2.urlopen(website) 
read_website = get_website.read() 
soup = BeautifulSoup(read_website) 
print([a.get('href') for a in soup.find_all("a") ]) 
: 자신의 코드를 사용하여

from bs4 import BeautifulSoup 
import requests 

r = requests.get("http://www.nytimes.com") 

soup = BeautifulSoup(r.content) 
print([a.get('href') for a in soup.find_all("a") ]) 
['http://www.nytimes.com/content/help/site/ie8-support.html', '#top-news', '#site-index-navigation', 'http://international.nytimes.com', 'http://cn.nytimes.com', 'http://www.nytimes.com/pages/todayspaper/index.html', 'http://www.nytimes.com/video', 'http://www.nytimes.com/weather', 'http://www.nytimes.com/pages/world/index.html', 'http://www.nytimes.com/pages/national/index.html', 'http://www.nytimes.com/pages/politics/index.html', 'http://www.nytimes.com/pages/nyregion/index.html', 'http://www.nytimes.com/pages/business/index.html', 'http://www.nytimes.com/pages/business/international/index.html', 'http://www.nytimes.com/pages/opinion/index.html', 'http://www.nytimes.com/pages/opinion/international/index.html', 'http://www.nytimes.com/pages/technology/index.html', 'http://www.nytimes.com/pages/science/index.html', 'http://www.nytimes.com/pages/health/index.html', 'http://www.nytimes.com/pages/sports/index.html', 'http://www.nytimes.com/pages/sports/international/index.html', 'http://www.nytimes.com/pages/arts/index.html', 'http://www.nytimes.com/pages/arts/international/index.html', 'http://www.nytimes.com/pages/fashion/index.html', 'http://www.nytimes.com/pages/style/international/index.html', 'http://www.nytimes.com/pages/dining/index.html', 'http://www.nytimes.com/pages/dining/international/index.html', 'http://www.nytimes.com/pages/garden/index.html', 'http://www.nytimes.com/pages/travel/index.html', 'http://www.nytimes.com/pages/magazine/index.html', 'http://www.nytimes.com/pages/realestate/index.html', 'http://www.nytimes.com', 'http://international.nytimes.com/?iht', 'http://www.nytimes.com/pages/world/index.html', 'http://www.nytimes.com/pages/national/index.html', 'http://www.nytimes.com/pages/politics/index.html', 'http://www.nytimes.com/pages/nyregion/index.html', 'http://www.nytimes.com/pages/business/index.html', 'http://www.nytimes.com/pages/business/international/index.html', 'http://www.nytimes.com/pages/opinion/index.html', 'http://www.nytimes.com/pages/opinion/international/index.html', 'http://www.nytimes.com/pages/technology/index.html', 'http://www.nytimes.com/pages/science/index.html', 'http://www.nytimes.com/pages/health/index.html', 'http://www.nytimes.com/pages/sports/index.html', 'http://www.nytimes.com/pages/sports/international/index.html', 'http://www.nytimes.com/pages/arts/index.html', 'http://www.nytimes.com/pages/arts/international/index.html', 'http://www.nytimes.com/pages/fashion/index.html', 'http://www.nytimes.com/pages/style/international/index.html', 'http://www.nytimes.com/pages/dining/index.html', 'http://www.nytimes.com/pages/dining/international/index.html', 'http://www.nytimes.com/pages/garden/index.html', 'http://www.nytimes.com/pages/travel/index.html', 'http://www.nytimes.com/pages/magazine/index.html', 'http://www.nytimes.com/pages/realestate/index.html', 'http://www.nytimes.com/pages/obituaries/index.html', 'http://www.nytimes.com/video/', 'http://www.nytimes.com/upshot/', None, '', 'http://www.nytimes.com/pages/world/africa/index.html', 'http://www.nytimes.com/pages/world/americas/index.html', 'http://www.nytimes.com/pages/world/asia/index.html', 'http://www.nytimes.com/pages/world/europe/index.html', 'http://www.nytimes.com/pages/world/middleeast/index.html', 'http://atwar.blogs.nytimes.com/', 'http://india.blogs.nytimes.com/', 'http://sinosphere.blogs.nytimes.com/', 'http://www.nytimes.com/pages/education/index.html', 'http://www.nytimes.com/politics/first-draft/', 'http://elections.nytimes.com/', 'http://cityroom.blogs.nytimes.com/', 'http://artsbeat.blogs.nytimes.com/', 'http://www.nytimes.com/events/', 'http://dealbook.nytimes.com/', 'http://www.nytimes.com/pages/business/economy/index.html', 'http://www.nytimes.com/pages/business/energy-environment/index.html', 'http://markets.on.nytimes.com/', 'http://www.nytimes.com/pages/business/media/index.html', 'http://www.nytimes.com/pages/business/smallbusiness/index.html', 'http://www.nytimes.com/pages/your-money/index.html', 'http://dealbook.nytimes.com/', 'http://www.nytimes.com/pages/business/economy/index.html', 'http://www.nytimes.com/pages/business/energy-environment/index.html', 'http://markets.on.nytimes.com/', 'http://www.nytimes.com/pages/business/media/index.html', 'http://www.nytimes.com/pages/business/smallbusiness/index.html', 'http://www.nytimes.com/pages/your-money/index.html', 'http://www.nytimes.com/pages/opinion/index.html#columnists', 'http://www.nytimes.com/pages/opinion/index.html#editorials', 'http://www.nytimes.com/pages/opinion/index.html#contributing', 'http://www.nytimes.com/pages/opinion/index.html#op-ed', 'http://www.nytimes.com/pages/opinion/index.html#opinionator', 'http://www.nytimes.com/pages/opinion/index.html#letters', 'http://www.nytimes.com/pages/opinion/index.html#sundayreview', 'http://www.nytimes.com/pages/opinion/index.html#takingNote', 'http://www.nytimes.com/pages/opinion/index.html#roomfordebate', 'http://publiceditor.blogs.nytimes.com/', 'http://wordplay.blogs.nytimes.com/cartoons/', 'http://www.nytimes.com/pages/opinion/international/index.html#columnistsGlobal', 'http://www.nytimes.com/pages/opinion/international/index.html#editorialsGlobal', 'http://www.nytimes.com/pages/opinion/international/index.html#contributing', 'http://www.nytimes.com/pages/opinion/international/index.html#op-edGlobal', 'http://www.nytimes.com/pages/opinion/index.html#opinionator', 'http://www.nytimes.com/pages/opinion/international/index.html#letters', 'http://www.nytimes.com/pages/opinion/index.html#sundayreview', 'http://www.nytimes.com/pages/opinion/international/index.html#takingNote', 'http://www.nytimes.com/pages/opinion/international/index.html#roomfordebate', 'http://publiceditor.blogs.nytimes.com/', 'http://wordplay.blogs.nytimes.com/cartoons/', 'http://bits.blogs.nytimes.com/', 'http://www.nytimes.com/pages/technology/personaltech/index.html', 'http://www.nytimes.com/pages/science/earth/index.html', 'http://www.nytimes.com/pages/science/space/index.html', 'http://well.blogs.nytimes.com/', 'http://www.nytimes.com/health/guides/index.html', 'http://www.nytimes.com/pages/health/nutrition/index.html', 'http://www.nytimes.com/pages/health/policy/index.html', 'http://newoldage.blogs.nytimes.com/', 'http://www.nytimes.com/pages/health/views/index.html', 'http://www.nytimes.com/pages/sports/baseball/index.html', 'http://www.nytimes.com/pages/sports/ncaabasketball/index.html', 'http://www.nytimes.com/pages/sports/basketball/index.html', 'http://www.nytimes.com/pages/sports/ncaafootball/index.html', 'http://www.nytimes.com/pages/sports/football/index.html', 'http://www.nytimes.com/pages/sports/golf/index.html', 'http://www.nytimes.com/pages/sports/hockey/index.html', 'http://www.nytimes.com/pages/sports/soccer/index.html', 'http://www.nytimes.com/pages/sports/tennis/index.html', 'http://www.nytimes.com/pages/sports/baseball/index.html', 'http://www.nytimes.com/pages/sports/ncaabasketball/index.html', 'http://www.nytimes.com/pages/sports/basketball/index.html', 'http://www.nytimes.com/pages/sports/ncaafootball/index.html', 'http://www.nytimes.com/pages/sports/football/index.html', 'http://www.nytimes.com/pages/sports/golf/index.html', 'http://www.nytimes.com/pages/sports/hockey/index.html', 'http://www.nytimes.com/pages/sports/soccer/index.html', 'http://www.nytimes.com/pages/sports/tennis/index.html', 'http://www.nytimes.com/pages/arts/design/index.html', 'http://artsbeat.blogs.nytimes.com/', 'http://www.nytimes.com/pages/books/index.html', 'http://www.nytimes.com/pages/arts/dance/index.html', 'http://www.nytimes.com/pages/movies/index.html', 'http://www.nytimes.com/pages/arts/music/index.html', 'http://www.nytimes.com/events/', 'http://www.nytimes.com/pages/arts/television/index.html', 'http://www.nytimes.com/pages/theater/index.html', 'http://www.nytimes.com/pages/arts/video-games/index.html', 'http://www.nytimes.com/pages/arts/design/index.html', 'http://artsbeat.blogs.nytimes.com/', 'http://www.nytimes.com/pages/books/index.html', 'http://www.nytimes.com/pages/arts/dance/index.html', 'http://www.nytimes.com/pages/movies/index.html', 'http://www.nytimes.com/pages/arts/music/index.html', 'http://www.nytimes.com/events/', 'http://www.nytimes.com/pages/arts/television/index.html', 'http://www.nytimes.com/pages/theater/index.html', 'http://www.nytimes.com/pages/arts/video-games/index.html', 'http://www.nytimes.com/pages/t-magazine/index.html', 'http://parenting.blogs.nytimes.com/', 'http://runway.blogs.nytimes.com/', 'http://www.nytimes.com/pages/fashion/weddings/index.html', 'http://www.nytimes.com/pages/t-magazine/index.html', 'http://parenting.blogs.nytimes.com/', 'http://runway.blogs.nytimes.com/', 'http://www.nytimes.com/pages/fashion/weddings/index.html', 'http://cooking.nytimes.com', 'http://www.nytimes.com/restaurants/search/', 'http://cooking.nytimes.com', 'http://www.nytimes.com/restaurants/search/', 'http://www.nytimes.com/pages/realestate/commercial/index.html', 'http://www.nytimes.com/pages/great-homes-and-destinations/index.html', 'http://realestate.nytimes.com/my/saved_listings.aspx', 'http://www.nytimes.com/video/us-politics', 'http://www.nytimes.com/video/world', 'http://www.nytimes.com/video/n-y-region', 'http://www.nytimes.com/video/opinion', 'http://www.nytimes.com/video/times-documentaries', 'http://www.nytimes.com/video/business', 'http://www.nytimes.com/video/technology', 'http://www.nytimes.com/video/arts', 'http://www.nytimes.com/video/style', 'http://www.nytimes.com/video/health', 'http://www.nytimes.com/video/dining-and-wine', 'http://www.nytimes.com/video/travel', 'http://www.nytimes.com/video/sports', 'http://www.nytimes.com/video/real-estate', 'http://www.nytimes.com/video/science', 'http://www.nytimes.com/crosswords/', 'http://www.nytimes.com/times-insider', 'http://www.nytimes.com/pages/todayspaper/index.html', 'http://www.nytimes.com/pages/automobiles/index.html', 'http://www.nytimes.com/pages/corrections/index.html', 'http://www.nytimes.com/pages/multimedia/index.html', 'http://lens.blogs.nytimes.com/', 'http://www.nytimes.com/ref/classifieds/', 'http://www.nytimes.com/marketing/tools-and-services/', 'http://jobmarket.nytimes.com/pages/jobs/index.html', 'http://www.nytimes.com/pages/topics/', 'http://www.nytimes.com/interactive/blogs/directory.html', 'http://www.nytstore.com/?&t=qry542&utm_source=nytimes&utm_medium=HPB&utm_content=hp_browsetree&utm_campaign=NYT-HP&module=SectionsNav&action=click&region=TopBar&version=BrowseTree&contentCollection=NYT%20Store&contentPlacement=2&pgtype=Homepage', 'http://www.nytimes.com/times-journeys/?utm_source=nytimes&utm_medium=HPLink&utm_content=hp_browsetree&utm_campaign=NYT-HP&module=SectionsNav&action=click&region=TopBar&version=BrowseTree&contentCollection=Times%20Journeys&contentPlacement=2&pgtype=Homepage', 'http://www.nytimes.com/seeallnav', 'http://www.nytimes.com/membercenter', '', 'http://www.nytimes.com/pages/opinion/index.html#columnists/charlesMBlow', 'http://www.nytimes.com/pages/opinion/index.html#columnists/davidBrooks', 'http://www.nytimes.com/pages/opinion/index.html#columnists/frankBruni', 'http://www.nytimes.com/pages/opinion/index.html#columnists/rogerCohen', 'http://www.nytimes.com/pages/opinion/index.html#columnists/gailCollins', 'http://www.nytimes.com/pages/opinion/index.html#columnists/rossDouthat', 'http://www.nytimes.com/pages/opinion/index.html#columnists/maureenDowd', 'http://www.nytimes.com/pages/opinion/index.html#columnists/thomasLFriedman', 'http://www.nytimes.com/pages/opinion/index.html#columnists/nicholasDKristof', 'http://www.nytimes.com/pages/opinion/index.html#columnists/paulKrugman', 'http://www.nytimes.com/pages/opinion/index.html#columnists/joeNocera', 'http://www.nytimes.com/pages/opinion/index.html#columnists/charlesMBlow', 'http://www.nytimes.com/pages/opinion/index.html#columnists/davidBrooks', 'http://www.nytimes.com/pages/opinion/index.html#columnists/frankBruni', 'http://www.nytimes.com/pages/opinion/index.html#columnists/rogerCohen', 'http://www.nytimes.com/pages/opinion/index.html#columnists/gailCollins', 'http://www.nytimes.com/pages/opinion/index.html#columnists/rossDouthat', 'http://www.nytimes.com/pages/opinion/index.html#columnists/maureenDowd', 'http://www.nytimes.com/pages/opinion/index.html#columnists/thomasLFriedman', 'http://www.nytimes.com/pages/opinion/index.html#columnists/nicholasDKristof', 'http://www.nytimes.com/pages/opinion/index.html#columnists/paulKrugman', 'http://www.nytimes.com/pages/opinion/index.html#columnists/joeNocera', 'http://www.nytimes.com/2014/11/28/business/drug-maker-gave-large-payments-to-doctors-with-troubled-track-records.html', 'http://www.nytimes.com/upshot', 'http://www.nytimes.com/2014/11/28/upshot/under-pressure-from-uber-taxi-medallion-prices-are-plummeting.html', 'http://www.nytimes.com/2014/11/28/upshot/under-pressure-from-uber-taxi-medallion-prices-are-plummeting.html?hp&target=comments#commentsContainer', 'http://www.nytimes.com/2014/11/27/us/without-passing-a-single-law-obama-crafts-bold-enviornmental-policy.html', 'http://www.nytimes.com/2014/11/27/us/ferguson-experts-weigh-darren-wilsons-decisions-leading-to-fatal-shooting-of-michael-brown.html', 'http://www.nytimes.com/2014/11/27/us/ferguson-experts-weigh-darren-wilsons-decisions-leading-to-fatal-shooting-of-michael-brown.html?hp&target=comments#commentsContainer', 'http://www.nytimes.com/2014/11/28/us/ferguson-protests-michael-brown-darren-wilson.html', 'http://www.nytimes.com/2014/11/27/us/after-disputed-verdict-reckoning-for-ferguson.html', 'http://www.nytimes.com/2014/11/28/arts/international/p-d-james-mystery-novelist-known-as-queen-of-crime-dies-at-94.html', 'http://www.nytimes.com/2014/11/28/arts/international/p-d-james-mystery-novelist-known-as-queen-of-crime-dies-at-94.html', 'http://www.nytimes.com/2014/11/28/world/middleeast/iran-nuclear-talks-extension.html', 'http://www.nytimes.com/2014/11/26/world/middleeast/iran-nuclear-talks-extension.html', 'http://www.nytimes.com/2014/11/30/magazine/the-militarys-rough-justice-on-sexual-assault.html', 'http://www.nytimes.com/2014/11/30/magazine/the-militarys-rough-justice-on-sexual-assault.html', 'http://www.nytimes.com/2014/11/30/magazine/the-militarys-rough-justice-on-sexual-assault.html?hp&target=comments#commentsContainer', 'http://www.nytimes.com/2014/11/28/sports/ncaafootball/mit-is-10-0-and-finding-success-in-ncaa-division-iii-playoffs.html', 'http://www.nytimes.com/2014/11/28/sports/ncaafootball/mit-is-10-0-and-finding-success-in-ncaa-division-iii-playoffs.html', 'http://www.nytimes.com/2014/11/28/sports/ncaafootball/mit-is-10-0-and-finding-success-in-ncaa-division-iii-playoffs.html?hp&target=comments#commentsContainer', 'http://www.nytimes.com/2014/11/28/business/international/opec-leaves-oil-production-quotas-.......]