2017-11-09 1 views
0

다음 코드를 가지고 있으며 모든 사람들의 도움을 받아이 코드를 사용합니다. 내가 가지고있는 질문에 답하는 관련 스레드를 찾으려고했지만 하나를 찾을 수 없으므로 여기에 있습니다.다중 사이트 - 파이썬 웹 스크레이핑

어떻게이 코드에 여러 사이트를 추가하여 csv 파일에 적절하게 인쇄 할 수 있습니까?

다음은 내가 추가하고 싶은 사이트입니다 (추가로 3 개 이상있을 것입니다). 도움을 주셔서 감사합니다.

'https://www.publicstorage.com/north-carolina/self-storage-charlotte-nc/28205-self-storage/1796?PID=PSLocalSearch&CID=1341&CHID=LL'

'https://www.publicstorage.com/north-carolina/self-storage-charlotte-nc/28215-self-storage/2079?PID=PSLocalSearch&CID=1341&CHID=LL'

은 'https://www.publicstorage.com/north-carolina/self-storage-charlotte-nc/28213-self-storage/2441?PID=PSLocalSearch&CID=1341&CHID=LL'아래

코드입니다 : 여기

from urllib.request import urlopen as uReq 
from bs4 import BeautifulSoup as soup 


#setting my_url to the wesite 
my_url = 'https://www.publicstorage.com/north-carolina/self-storage- 
charlotte-nc/28206-self-storage/2334? 
lat=35.23552&lng=-80.83296&clp=1&sp=Charlotte|35.2270869|-80.8431267&ismi=1' 

#Opening up connection, grabbing the page 
uClient = uReq(my_url) 

#naming uClient to page_html 
page_html = uClient.read() 

#closing uClient 
uClient.close() 

#this does my html parsing 
page_soup = soup(page_html, "html.parser") 

#setting container to capture where the actual info is using inspect element 
#grabs each product 
containers = page_soup.findAll("li",{"class":"srp_res_row plp"}) 
store_locator = page_soup.findAll("div", {"itemprop":"address"}) 

filename = "product.csv" 
f = open(filename, "w") 

headers = "unit_size, size_dim1, unit_type, online_price, reg_price, 
street_address, store_city\n" 

f.write(headers) 

for container in containers: 
    for store_location in store_locator: 
     street_address = store_location.findAll("span", 
{"itemprop":"streetAddress"}) 
     store_city = store_location.findAll("span", 
{"itemprop":"addressLocality"}) 
    title_container = container.div.div 
    unit_size = title_container.text 
    size_dim = container.findAll("div", {"class":"srp_label srp_font_14"}) 
    unit_container = container.li 
    unit_type = unit_container.text 
    online_price = container.findAll("div", {"class":"srp_label alt-price"}) 
    reg_price = container.findAll("div", {"class":"reg-price"}) 


    for item in zip(unit_size,size_dim,unit_container,online_price,reg_price,street_address,stor 
e_city): 
     csv=item[0] + "," + item[1].text + "," + item[2] + "," + 
item[3].text + "," + item[4].text + "," + item[5].text + "," + item[6].text 
+ "\n" 
     f.write(csv) 

는 HTML 스크립트입니다;

<li class="srp_res_row plp"> 
 
    <div class="srp_res_clm srp_clm160"> 
 
     <div class="srp_label plp">Small</div> 
 
     <div class="srp_v-space_3"></div> 
 
     <div class="srp_label srp_font_14" style="padding-left: 5px;">5' x 10'</div> 
 
     <div class="srp_v-space_3"></div> 
 
    </div> 
 
    <div class="srp_res_clm srp_clm120"> 
 
     <ul class="srp_list"> 
 
      <li>Outside unit/Drive-up access</li> 
 
     </ul> 
 
    </div> 
 
    <div class="srp_res_clm srp_clm90"> 
 
     <div class="srp_label">$1<span class="srp_label_symbol">†</span></div> 
 
     <div class="srp_v-space_10">1st Month</div> 
 
    </div> 
 
    <div class="srp_res_clm srp_clm90"> 
 
     <div class="srp_label alt-price">$56/mo.</div> 
 
     <div class="online-special">Online Special<span class="srp_label_symbol">†</span></div> 
 
     <div class="srp_v-space_15"></div> 
 
     <div class="reg-price">$70 In-store</div> 
 
    </div> 
 
    <div class="srp_res_clm srp_clm100 srp_vcenter"><a class="srp_continue unit-no-deposit" data-deposit-amount="0" data-deposit-days="0" data-features="Outside unit/Drive-up access" data-marketing-size="5x10" data-ppk="altproduct_price" data-promotionid="132" data-siteid="2334" data-size-description="5' x 10'" data-sizeid="613573" data-wc2-unit="false" href="/ReservationDetails.aspx?st=2334&amp;sz=613573&amp;key=[rnd]&amp;location=&amp;plp=1&amp;rk=&amp;ismi=1&amp;sp=Charlotte%7c35.2270869%7c-80.8431267&amp;clp=1"><img alt="Continue" src="/images/srp-cont-new-80.png" style="width: 80px; height: 32px"/></a></div> 
 
</li>

+0

당신은 각각의 이상 목록 및 루프의 URL을 저장할 수 있습니다 그러면 CSV를 스크랩하고 저장하십시오. – Ali

+0

@Ali - 빠른 답장을 보내 주셔서 감사합니다. 이걸하는 방법을 보여 주시겠습니까? –

+0

아래 답변을 참조하십시오. – Ali

답변

0

코드 :

from urllib.request import urlopen as uReq 
from bs4 import BeautifulSoup as soup 

# setting my_url to the wesite 
urls = ['https://www.publicstorage.com/north-carolina/self-storage-charlotte-nc/28206-self-storage/2334?lat=35.23552&lng=-80.83296&clp=1&sp=Charlotte|35.2270869|-80.8431267&ismi=1' 
    , 'https://www.publicstorage.com/north-carolina/self-storage-charlotte-nc/28205-self-storage/1796?PID=PSLocalSearch&CID=1341&CHID=LL' 
    , 'https://www.publicstorage.com/north-carolina/self-storage-charlotte-nc/28215-self-storage/2079?PID=PSLocalSearch&CID=1341&CHID=LL' 
    , 'https://www.publicstorage.com/north-carolina/self-storage-charlotte-nc/28213-self-storage/2441?PID=PSLocalSearch&CID=1341&CHID=LL'] 

filename = "product.csv" 
open(filename, 'w').close() 
f = open(filename, "a") 
num = 0 

headers = "unit_size, size_dim1, unit_type, online_price, reg_price, street_address, store_city\n" 

f.write(headers) 

for my_url in urls: 
    # Opening up connection, grabbing the page 
    uClient = uReq(my_url) 

    # naming uClient to page_html 
    page_html = uClient.read() 

    # closing uClient 
    uClient.close() 

    # this does my html parsing 
    page_soup = soup(page_html, "html.parser") 

    # setting container to capture where the actual info is using inspect element 
    # grabs each product 
    containers = page_soup.findAll("li", {"class": "srp_res_row plp"}) 
    store_locator = page_soup.findAll("div", {"itemprop": "address"}) 

    f.write("website " + str(num) + ": \n") 
    for container in containers: 
     for store_location in store_locator: 
      street_address = store_location.findAll("span", {"itemprop": "streetAddress"}) 
      store_city = store_location.findAll("span", {"itemprop": "addressLocality"}) 
      title_container = container.div.div 
      unit_size = title_container.text 
      size_dim = container.findAll("div", {"class": "srp_label srp_font_14"}) 
      unit_container = container.li 
      unit_type = unit_container.text 
      online_price = container.findAll("div", {"class": "srp_label alt-price"}) 
      reg_price = container.findAll("div", {"class": "reg-price"}) 

     for item in zip(unit_size, size_dim, unit_container, online_price, reg_price, street_address, store_city): 
      csv = item[0] + "," + item[1].text + "," + item[2] + "," + item[3].text + "," + item[4].text + "," + item[5].text + "," + item[6].text + "\n" 
      f.write(csv) 
    num += 1 

출력 (product.csv의 내용) :

unit_size, size_dim1, unit_type, online_price, reg_price, street_address, store_city 
website 0: 
S,5' x 10',Outside unit/Drive-up access,$55/mo.,$68 In-store,1001 N Tryon St,Charlotte 
M,5' x 15',Outside unit/Drive-up access,$68/mo.,$84 In-store,1001 N Tryon St,Charlotte 
M,10' x 10',Outside unit/Drive-up access,$101/mo.,$126 In-store,1001 N Tryon St,Charlotte 
L,10' x 15',Outside unit/Drive-up access,$154/mo.,$187 In-store,1001 N Tryon St,Charlotte 
L,10' x 25',Outside unit/Drive-up access,$167/mo.,$208 In-store,1001 N Tryon St,Charlotte 
L,10' x 20',Outside unit/Drive-up access,$172/mo.,$209 In-store,1001 N Tryon St,Charlotte 
L,15' x 20',Outside unit/Drive-up access,$193/mo.,$241 In-store,1001 N Tryon St,Charlotte 
website 1: 
S,5' x 5',Outside unit/Drive-up access,$50/mo.,$60 In-store,3710 Monroe Road,Charlotte 
S,5' x 10',Outside unit/Drive-up access,$53/mo.,$66 In-store,3710 Monroe Road,Charlotte 
S,10' x 5',Outside unit/Drive-up access,$55/mo.,$68 In-store,3710 Monroe Road,Charlotte 
M,10' x 10',Outside unit/Drive-up access,$97/mo.,$118 In-store,3710 Monroe Road,Charlotte 
L,10' x 15',Outside unit/Drive-up access,$100/mo.,$124 In-store,3710 Monroe Road,Charlotte 
L,10' x 20',Outside unit/Drive-up access,$128/mo.,$159 In-store,3710 Monroe Road,Charlotte 
M,10' x 10',Climate Controlled,$129/mo.,$157 In-store,3710 Monroe Road,Charlotte 
L,20' x 30',Outside unit/Drive-up access,$292/mo.,$356 In-store,3710 Monroe Road,Charlotte 
website 2: 
S,5' x 10',Outside unit/Drive-up access,$36/mo.,$45 In-store,5301 N Sharon Amity Rd,Charlotte 
S,10' x 5',Outside unit/Drive-up access,$36/mo.,$45 In-store,5301 N Sharon Amity Rd,Charlotte 
S,5' x 5',Outside unit/Drive-up access,$42/mo.,$53 In-store,5301 N Sharon Amity Rd,Charlotte 
M,10' x 10',Outside unit/Drive-up access,$80/mo.,$99 In-store,5301 N Sharon Amity Rd,Charlotte 
L,10' x 15',Outside unit/Drive-up access,$87/mo.,$108 In-store,5301 N Sharon Amity Rd,Charlotte 
L,10' x 20',Outside unit/Drive-up access,$100/mo.,$124 In-store,5301 N Sharon Amity Rd,Charlotte 
L,20' x 10',Outside unit/Drive-up access,$100/mo.,$125 In-store,5301 N Sharon Amity Rd,Charlotte 
M,10' x 10',Climate Controlled,$112/mo.,$139 In-store,5301 N Sharon Amity Rd,Charlotte 
L,10' x 25',Outside unit/Drive-up access,$121/mo.,$153 In-store,5301 N Sharon Amity Rd,Charlotte 
L,20' x 10',Climate Controlled,$123/mo.,$153 In-store,5301 N Sharon Amity Rd,Charlotte 
L,20' x 20',Outside unit/Drive-up access,$135/mo.,$168 In-store,5301 N Sharon Amity Rd,Charlotte 
website 3: 
S,3' x 3',Inside unit/1st Floor,$17/mo.,$22 In-store,4730 N Tryon St,Charlotte 
S,5' x 5',Outside unit/Drive-up access,$35/mo.,$43 In-store,4730 N Tryon St,Charlotte 
S,5' x 10',Outside unit/Drive-up access,$39/mo.,$49 In-store,4730 N Tryon St,Charlotte 
S,10' x 5',Outside unit/Drive-up access,$40/mo.,$50 In-store,4730 N Tryon St,Charlotte 
M,5' x 15',Outside unit/Drive-up access,$65/mo.,$81 In-store,4730 N Tryon St,Charlotte 
M,20' x 5',Outside unit/Drive-up access,$65/mo.,$81 In-store,4730 N Tryon St,Charlotte 
M,10' x 10',Outside unit/Drive-up access,$66/mo.,$82 In-store,4730 N Tryon St,Charlotte 
L,10' x 15',Outside unit/Drive-up access,$84/mo.,$105 In-store,4730 N Tryon St,Charlotte 
L,10' x 20',Outside unit/Drive-up access,$136/mo.,$169 In-store,4730 N Tryon St,Charlotte 
+0

@ Ali - 고마워요! –

관련 문제