2017-11-27 1 views
0

일부 스크래핑 코드를 작성 중이며 위와 같은 오류가 발생했습니다. 내 코드는 다음과 같습니다."TypeError : str 및 non-str 인수를 함께 사용할 수 있습니까?"

# -*- coding: utf-8 -*- 
import scrapy 
from myproject.items import Headline 


class NewsSpider(scrapy.Spider): 
    name = 'IC' 
    allowed_domains = ['kosoku.jp'] 
    start_urls = ['http://kosoku.jp/ic.php'] 

    def parse(self, response): 
     """ 
     extract target urls and combine them with the main domain 
     """ 
     for url in response.css('table a::attr("href")'): 
      yield(scrapy.Request(response.urljoin(url), self.parse_topics)) 

    def parse_topics(self, response): 
     """ 
     pick up necessary information 
     """ 
     item=Headline() 
     item["name"]=response.css("h2#page-name ::text").re(r'.*(インターチェンジ)') 
     item["road"]=response.css("div.ic-basic-info-left div:last-of-type ::text").re(r'.*道$') 
     yield item 

내가 쉘 스크립트에 개별적으로 할 때 올바른 응답을 얻을 수 있지만,이 프로그램 실행에 도착하면,이 발생하지 않습니다.

2017-11-27 18:26:17 [scrapy.core.scraper] ERROR: Spider error processing <GET http://kosoku.jp/ic.php> (referer: None) 
Traceback (most recent call last): 
    File "/Users/sonogi/envs/scrapy/lib/python3.5/site-packages/scrapy/utils/defer.py", line 102, in iter_errback 
    yield next(it) 
    File "/Users/sonogi/envs/scrapy/lib/python3.5/site-packages/scrapy/spidermiddlewares/offsite.py", line 29, in process_spider_output 
    for x in result: 
    File "/Users/sonogi/envs/scrapy/lib/python3.5/site-packages/scrapy/spidermiddlewares/referer.py", line 339, in <genexpr> 
    return (_set_referer(r) for r in result or()) 
    File "/Users/sonogi/envs/scrapy/lib/python3.5/site-packages/scrapy/spidermiddlewares/urllength.py", line 37, in <genexpr> 
    return (r for r in result or() if _filter(r)) 
    File "/Users/sonogi/envs/scrapy/lib/python3.5/site-packages/scrapy/spidermiddlewares/depth.py", line 58, in <genexpr> 
    return (r for r in result or() if _filter(r)) 
    File "/Users/sonogi/scraping/myproject/myproject/spiders/IC.py", line 16, in parse 
    yield(scrapy.Request(response.urljoin(url), self.parse_topics)) 
    File "/Users/sonogi/envs/scrapy/lib/python3.5/site-packages/scrapy/http/response/text.py", line 82, in urljoin 
    return urljoin(get_base_url(self), url) 
    File "/opt/local/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/urllib/parse.py", line 424, in urljoin 
    base, url, _coerce_result = _coerce_args(base, url) 
    File "/opt/local/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/urllib/parse.py", line 120, in _coerce_args 
    raise TypeError("Cannot mix str and non-str arguments") 
TypeError: Cannot mix str and non-str arguments 
2017-11-27 18:26:17 [scrapy.core.engine] INFO: Closing spider (finished) 

나는 혼란스럽고 누구나 도와 주셔서 감사합니다.

답변

1

사용중인 .css(selector) 메서드는 Scrapy 설명서에 따라 SelectorList 인스턴스를 반환합니다. URL의 실제 (유니 코드) 문자열 버전을 원하시면 extract() 메서드로 호출하십시오.

def parse(self, response): 
    for url in response.css('table a::attr("href")').extract(): 
     yield(scrapy.Request(response.urljoin(url), self.parse_topics)) 
+0

정말 고마워요! –

관련 문제