2016-08-17 7 views
-1

여기에 새 사용자가 있습니다. 나는 을 시작하여을 사용하여 파이썬 구문을 사용하지만 for 루프를 계속 던져 버릴 수 있습니다. 내가 지금까지 도달 한 각 시나리오 (및 이전 예제)는 이해하지만 현재 시나리오에서는 문제가 하나도 나타나지 않습니다.urlopen for for beautifulsoup

나는 앱 스토어에서 기능을 추출하기 위해 BeautifulSoup을 가지고 놀고 있습니다.

Google Play 및 iTunes URL을 모두 재생할 수있는 목록을 만들었습니다.

list = {"https://play.google.com/store/apps/details?id=com.tov.google.ben10Xenodromeplus&hl=en", 
"https://play.google.com/store/apps/details?id=com.doraemon.doraemonRepairShopSeasons&hl=en", 
"https://play.google.com/store/apps/details?id=com.KnowledgeAdventure.SchoolOfDragons&hl=en", 
"https://play.google.com/store/apps/details?id=com.turner.stevenrpg&hl=en", 
"https://play.google.com/store/apps/details?id=com.indigokids.mimdoctor&hl=en", 
"https://play.google.com/store/apps/details?id=com.rovio.gold&hl=en", 
"https://itunes.apple.com/us/app/angry-birds/id343200656?mt=8", 
"https://itunes.apple.com/us/app/doodle-jump/id307727765?mt=8", 
"https://itunes.apple.com/us/app/tiny-wings/id417817520?mt=8", 
"https://itunes.apple.com/us/app/flick-home-run-!/id454086751?mt=8", 
"https://itunes.apple.com/us/app/bike-race-pro/id510461370?mt=8"} 

이 (내 코드에서 BS) BeautifulSoup로를 테스트하기 위해, 나는 각 상점에 대해 하나의 응용 프로그램을 사용 :

print ios.find(itemprop="applicationCategory").get_text() 

:

gptest = bs(urllib.urlopen("https://play.google.com/store/apps/details?id=com.rovio.gold&hl=en")) 

ios = bs(urllib.urlopen("https://itunes.apple.com/us/app/doodle-jump/id307727765?mt=8")) 

내가 사용하는 아이튠즈에서 앱의 카테고리를 발견했다. Google Play :

print gptest.find(itemprop="genre").get_text() 
def opensite(): 
for item in list: 
    bs(urllib.urlopen()) 

for item in list: 
try: 
    if "itunes.apple.com" in row: 
     print "Category:", opensite.find(itemprop="applicationCategory").get_text() 
    else if "play.google.com" in row: 
     print "Category", opensite.find(itemprop="genre").get_text() 
except: 
    pass 

참고 :이 새로 발견 된 신뢰와, 여기

내 시도의 ... 내 전체 목록 및 출력이 값을 반복 원했고,하지만 난 루프에 대한 빨아 실현 : 이상적으로는 (하나 개의 컬럼 "URL"과 "샘플"이라고 함) CSV를 통과 할 거라고 그래서 난 내 루프

for row in sample.URL: 

로 시작하는 것입니다 생각하지만 난 그게 거래보다는 당신에게 목록을 표시하기 위해 더 많은 도움이되었다 생각 데이터 프레임.

미리 감사드립니다.

답변

1
from __future__ import print_function # 
try:         # 
    from urllib import urlopen   # Support Python 2 and 3 
except ImportError:      # 
    from urllib.request import urlopen # 

from bs4 import BeautifulSoup as bs 

for line in open('urls.dat'): # Read urls from file line by line 
    doc = bs(urlopen(line.strip()), 'html5lib') # Strip \n from url, open it and parse 
    if 'apple.com' in line: 
     prop = 'applicationCategory' 
    elif 'google.com' in line: 
     prop = 'genre' 
    else: 
     continue 
    print(doc.find(itemprop=prop).get_text()) 
1

목록에서 URL을 읽기 위해이 시도 :

from bs4 import BeautifulSoup as bs 
import urllib2 
import requests 

list = {"https://play.google.com/store/apps/details?id=com.tov.google.ben10Xenodromeplus&hl=en", 
"https://play.google.com/store/apps/details?id=com.doraemon.doraemonRepairShopSeasons&hl=en", 
"https://play.google.com/store/apps/details?id=com.KnowledgeAdventure.SchoolOfDragons&hl=en", 
"https://play.google.com/store/apps/details?id=com.turner.stevenrpg&hl=en", 
"https://play.google.com/store/apps/details?id=com.indigokids.mimdoctor&hl=en", 
"https://play.google.com/store/apps/details?id=com.rovio.gold&hl=en", 
"https://itunes.apple.com/us/app/angry-birds/id343200656?mt=8", 
"https://itunes.apple.com/us/app/doodle-jump/id307727765?mt=8", 
"https://itunes.apple.com/us/app/tiny-wings/id417817520?mt=8", 
"https://itunes.apple.com/us/app/flick-home-run-!/id454086751?mt=8", 
"https://itunes.apple.com/us/app/bike-race-pro/id510461370?mt=8"} 

def opensite(): 
    for item in list: 
     bs(urllib2.urlopen(item),"html.parser") 
     source = requests.get(item) 
     text_new = source.text 
     soup = bs(text_new, "html.parser") 

     try: 
      if "itunes.apple.com" in item: 
       print item,"Category:",soup.find('span',{'itemprop':'applicationCategory'}).text 
      elif "play.google.com" in item: 
       print item,"Category:", soup.find('span',{'itemprop':'genre'}).text 
     except: 
      pass 

opensite() 

그것은

https://itunes.apple.com/us/app/doodle-jump/id307727765?mt=8 Category: Games 
https://play.google.com/store/apps/details?id=com.KnowledgeAdventure.SchoolOfDragons&hl=en Category: Role Playing 
https://play.google.com/store/apps/details?id=com.tov.google.ben10Xenodromeplus&hl=en Category: Role Playing 
https://itunes.apple.com/us/app/tiny-wings/id417817520?mt=8 Category: Games 
https://play.google.com/store/apps/details?id=com.doraemon.doraemonRepairShopSeasons&hl=en Category: Role Playing 
https://itunes.apple.com/us/app/angry-birds/id343200656?mt=8 Category: Games 
https://play.google.com/store/apps/details?id=com.indigokids.mimdoctor&hl=en Category: Role Playing 
https://itunes.apple.com/us/app/bike-race-pro/id510461370?mt=8 Category: Games 
https://play.google.com/store/apps/details?id=com.rovio.gold&hl=en Category: Role Playing 
https://play.google.com/store/apps/details?id=com.turner.stevenrpg&hl=en Category: Role Playing 
https://itunes.apple.com/us/app/flick-home-run-!/id454086751?mt=8 Category: Games 
같이 인쇄됩니다