2011-11-20 3 views

답변

0

대신 CrawlSpider의 사용 BaseSpider, 다음 start_requests 또는 start_urls []

class MySpider(BaseSpider): 
    name = "myspider" 

    def start_requests(self): 
     return [Request("https://www.example.com", 
      callback=self.parse)] 

    def parse(self, response): 
     hxs = HtmlXPathSelector(response) 
     ... 
0

클래스 ThemenHubSpider에 추가 설정 + $ ') (CrawlSpider) :

name = 'themenHub' 
allowed_domains = ['themen.t-online.de'] 
start_urls = ["http://themen.t-online.de/themen-a-z/a"] 
rules = [Rule(SgmlLinkExtractor(allow=['id_\d+']), 'parse_news')]