2016-09-30 10 views
4

이 간단한 스크랩 코드가 있습니다. 그러나이 코드 부분을 response.urljoin(port_homepage_url) 사용할 때이 오류가 발생합니다.Scrapy (python) TypeError : unhashable type : 'list'

import re 

import scrapy 
from vesseltracker.items import VesseltrackerItem 


class GetVessel(scrapy.Spider): 
    name = "getvessel" 
    allowed_domains = ["marinetraffic.com"] 
    start_urls = [ 
     'http://www.marinetraffic.com/en/ais/index/ports/all/flag:AE', 
    ] 

def parse(self, response): 
    item = VesseltrackerItem() 
    for ports in response.xpath('//table/tr[position()>1]'): 
     item['port_name'] = ports.xpath('td[2]/a/text()').extract() 
     port_homepage_url = ports.xpath('td[7]/a/@href').extract() 
     port_homepage_url = response.urljoin(port_homepage_url) 
     yield scrapy.Request(port_homepage_url, callback=self.parse, meta={'item': item}) 

무엇이 잘못 될 수 있습니까?

다음은 오류 로그입니다.

2016-09-30 17:17:13 [scrapy] DEBUG: Crawled (200) <GET http://www.marinetraffic.com/robots.txt> (referer: None) 
2016-09-30 17:17:14 [scrapy] DEBUG: Crawled (200) <GET http://www.marinetraffic.com/en/ais/index/ports/all/flag:AE> (referer: None) 
2016-09-30 17:17:14 [scrapy] ERROR: Spider error processing <GET http://www.marinetraffic.com/en/ais/index/ports/all/flag:AE> (referer: None) 
Traceback (most recent call last): 
    File "/Users/noussh/python/env/lib/python2.7/site-packages/scrapy/utils/defer.py", line 102, in iter_errback 
    yield next(it) 
    File "/Users/noussh/python/env/lib/python2.7/site-packages/scrapy/spidermiddlewares/offsite.py", line 29, in process_spider_output 
    for x in result: 
    File "/Users/noussh/python/env/lib/python2.7/site-packages/scrapy/spidermiddlewares/referer.py", line 22, in <genexpr> 
    return (_set_referer(r) for r in result or()) 
    File "/Users/noussh/python/env/lib/python2.7/site-packages/scrapy/spidermiddlewares/urllength.py", line 37, in <genexpr> 
    return (r for r in result or() if _filter(r)) 
    File "/Users/noussh/python/env/lib/python2.7/site-packages/scrapy/spidermiddlewares/depth.py", line 58, in <genexpr> 
    return (r for r in result or() if _filter(r)) 
    File "/Users/noussh/python/vesseltracker/vesseltracker/spiders/marinetraffic.py", line 19, in parse 
    port_homepage_url = response.urljoin(port_homepage_url) 
    File "/Users/noussh/python/env/lib/python2.7/site-packages/scrapy/http/response/text.py", line 78, in urljoin 
    return urljoin(get_base_url(self), url) 
    File "/usr/local/Cellar/python/2.7.12/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urlparse.py", line 261, in urljoin 
    urlparse(url, bscheme, allow_fragments) 
    File "/usr/local/Cellar/python/2.7.12/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urlparse.py", line 143, in urlparse 
    tuple = urlsplit(url, scheme, allow_fragments) 
    File "/usr/local/Cellar/python/2.7.12/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urlparse.py", line 176, in urlsplit 
    cached = _parse_cache.get(key, None) 
TypeError: unhashable type: 'list' 

답변

3

ports.xpath('td[7]/a/@href').extract() 리턴한다 목록 당신이 그것을에서 "urljoin"을 수행하려고 할 때, 그것은 실패합니다. 대신 extract_first()을 사용하십시오 :

port_homepage_url = ports.xpath('td[7]/a/@href').extract_first() 
+0

감사합니다 @alecxe. 일했다! –

관련 문제