내 데이터가 xls로 저장되지 않는 이유는 무엇입니까?

나는 치료를 사용하여 아주 간단한 웹 스크레이퍼를 작성했습니다. xls 파일을 읽고 스크래핑 된 데이터를 정렬 할 기존 모듈이 있으므로 .xls 파일에 스크래핑 된 데이터를 저장하고 싶습니다. 하지만 실제로 xls을 저장하는 어리석은 걸림돌처럼 느껴지는 것을 쳤습니다. 자체 작동내 데이터가 xls로 저장되지 않는 이유는 무엇입니까?

거미 (이것은 크롤링하고 필요한 데이터 긁는)를 생성 .XLS 정확하게 초기화되고
한다.
스크랩 한 데이터는 각 항목을 스크래핑 한 후 xls에 기록됩니다. 지금까지 내가 저장 문을 어디다

그러나, 실제 웹 스크래핑이 시작 전에 구원을 얻을 것으로 보인다. 나를 초기화 된 (첫 번째 행은 제목으로 채워짐)으로 남겨 둡니다. 그러나 그렇지 않으면 빈 스프레드 시트. 여기에 내가 (무죄 서버를 저장하기 위해 제거 웹 사이트) 난 그냥 다시
global newDb newDb.save('./products_out.xls')

올바른 위치에

하지만를 추가 할 필요가 말하는 올바른 해요 생각

# encoding=utf-8 
from scrapy.contrib.spiders import CrawlSpider, Rule 
from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor 
from scrapy.selector import HtmlXPathSelector 
from scrapy.item import Item, Field 
from xlwt import Workbook 

# Working row on new spreadsheet 
row_number = 0 

# Create new spreadsheet 
newDb = Workbook(encoding='utf-8') 
newFile = newDb.add_sheet('Sheet1') 
values = ['product','description','image'] 

class TestSpider(CrawlSpider): 
    # Initiate new spreadsheet 
    global newFile 
    global values 
    global row_number 
    for cell in range (len(values)): 
     newFile.write(row_number, cell, values[cell]) 
    row_number = row_number + 1 

    # Initiate Spider 
    name = "Test" 
    allowed_domains = [] 
    start_urls = ["http://www.website.to/scrape",] 
    rules = (Rule(SgmlLinkExtractor(restrict_xpaths="//div[@class='content']/h3"), callback='parse_product'),) 

    def parse_product(self, response): 
     hxs = HtmlXPathSelector(response) 
     item = TestItem() 
     item['product'] = hxs.select('//div [@class = "col-right"][1]/table/tr[1]/td/text()').extract() 
     item['description'] = hxs.select('//div[@class="columns"][1]/div [@class = "col-right"]/p/text()').extract() 
     item['image'] = hxs.select('//img /@src').extract() 

     global values 
     global newFile 
     global row_number 

     # This is where products are written to the xls 
     for title in values: 
      # test to increase row_number, at the start of each new product 
      if title == "product": 
       row_number = row_number + 1 
      try: 
       newFile.write(row_number, values.index(title), item[title]) 
      except: 
       newFile.write(row_number, values.index(title), '') 

class TestItem(Item): 
    product = Field() 
    description = Field() 
    image = Field()

을 가지고있는, 그것은 어떤을 보인다 문제가 어디에 추가 할 지, print 명령문은 작업 순서가 항상 다음과 같음을 나타냅니다. xls 생성 -> initialize xls -> xls 저장 -> 저장하지 않고 xls -> scrape 및 write.

저는 개발이 새로 생겼습니다. 나는이 문제를 놓치고있어, 어떤 조언도 감사 할 것입니다.

출처

2013-02-08 user2051497

데이터를 가져 오시겠습니까? – TheSentinel

예 (정확한 start_url을 가리키면), parse_product의 끝에'return item '을 추가하면 수집 된 데이터가 긁힐 때 터미널에 출력됩니다. – user2051497

이상적으로 사용자 지정 항목 파이프 라인 클래스를 만들고 (예를 들어 치료 문서를보아야 함) 모든 파일 쓰기 코드를 넣어야합니다.

출처

2013-02-08 21:36:48 Talvalin

좋아, 고마워. 고마워. – user2051497

내 데이터가 xls로 저장되지 않는 이유는 무엇입니까?

답변

관련 문제