python scrapy : 동적 정보 스크래핑

http://www.qchp.org.qa/en/Pages/searchpractitioners.aspx에서 정보를 스크랩하려고합니다. 다음을 원합니다 : - 페이지 상단의 드롭 다운에서 "치과 의사"를 선택하십시오. - 페이지 하단의 정보가 자바 스크립트 을 사용하여 동적으로 변경됩니다. - 실무자 이름의 하이퍼 링크를 클릭하고 팝업이 나타납니다 - 각 종사자를위한 json/csv 파일에 모든 정보를 저장하고 싶습니다. - 페이지 하단에 링크 된 다른 페이지의 정보를 저장 div에있는 정보로 변경하고 싶습니다. .python scrapy : 동적 정보 스크래핑

나는 scrapy 매우 새롭고 어디 선가 읽은 당신은 그래서 나는 scrapy 응용 프로그램 내에서 셀레늄을 사용하고 동적 정보

을 위해 셀레늄을 필요로하기 때문에 단지 셀레늄으로 보았다입니다. 그것이 맞는지 아닌지 확실하지 않습니다. 나는 그것을하는 가장 좋은 방법이 무엇인지 전혀 모른다. 나는 지금까지 다음 코드를 가지고있다. 나는이 오류 sch_spider.py "

line 21, in DmozSpider 
    all_options = element.find_elements_by_tag_name("option") 
NameError: name 'element' is not defined

sch_spider.py납니다

from scrapy.spider import Spider 
from scrapy.selector import Selector 
from selenium import webdriver 
from selenium.webdriver.common.keys import Keys 
from scrapytutorial.items import SchItem 
from selenium.webdriver.support.ui import Select 

class DmozSpider(Spider): 
    name = "sch" 

    driver = webdriver.Firefox() 
    driver.get("http://www.qchp.org.qa/en/Pages/searchpractitioners.aspx") 
    select = Select(driver.find_element_by_name('ctl00$m$g_28bc0e11_4b8f_421f_84b7_d671de504bc3$ctl00$drp_practitionerType')) 
    all_options = element.find_elements_by_tag_name("option") 

    for option in all_options: 
     if option.get_attribute("value") == "4": #Dentist 
      option.click() 
     ends 
     break 

    driver.find_element_by_name("ctl00$m$g_28bc0e11_4b8f_421f_84b7_d671de504bc3$ctl00$Searchbtn").click() 


    def parse(self, response): 

     all_docs = element.find_elements_by_tag_name("td") 
     for name in all_docs: 
      name.click() 
      alert = driver.switch_to_alert() 
      sel = Selector(response) 
      ma = sel.xpath('//table') 
      items = [] 
      for site in ma: 
       item = SchItem() 
       item['name'] = site.xpath("//span[@id='PractitionerDetails1_lbl_Name']/text()").extract() 
       item['profession'] = site.xpath("//span[@id='PractitionerDetails1_lbl_Profession']/text()").extract() 
       item['scope_of_practise'] = site.xpath("//span[@id='PractitionerDetails1_lbl_sop']/text()").extract() 
       item['instituition'] = site.xpath("//span[@id='PractitionerDetails1_lbl_institution']/text()").extract() 
       item['license'] = site.xpath("//span[@id='PractitionerDetails1_lbl_LicenceNo']/text()").extract() 
       item['license_expiry_date'] = site.xpath("//span[@id='PractitionerDetails1_lbl_LicenceExpiry']/text()").extract() 
       item['qualification'] = site.xpath("//span[@id='PractitionerDetails1_lbl_Qualification']/text()").extract() 

       items.append(item) 
      return items

items.py

from scrapy.item import Item, Field 

class SchItem(Item): 

    name = Field() 
    profession = Field() 
    scope_of_practise = Field() 
    instituition = Field() 
    license = Field() 
    license_expiry_date = Field() 
    qualification = Field()

출처

2014-06-06 James L.

코드 검토를 찾고 있지 않습니다. 오류가있어 해결책을 찾고 있습니다. –

서버에 POST 요청을 보내야합니다. [이 대답은 여기에 있습니다] (http://stackoverflow.com/questions/10218581/using-scrapy-to-scrap-asp-net-website-with-javascript-buttons-and) - 아약 요청 -) 좋은 시작되어야합니다. – agstudy

당신이 element.find_elements을 변경하지 마십시오 ..에서 아래 코드를 선택하십시오 .find_element ..

select = Select(driver.find_element_by_name('ctl00$m$g_28bc0e11_4b8f_421f_84b7_d671de504bc3$ctl00$drp_practitionerType')) 
    all_options = element.find_elements_by_tag_name("option")

select.options를 사용하지 말아야합니까?

출처

2014-06-06 18:32:30 Biswanath

python scrapy : 동적 정보 스크래핑

답변

관련 문제