1

우리는 Forever 21의 웹 사이트에서 모든 카테고리의 모든 제품을 긁어 모으려고합니다. 제품 페이지가 있으면 필요한 정보를 추출하는 방법을 알 수 있으며 범주가 주어지면 모든 제품을 추출 할 수 있습니다. 그러나 모든 제품 범주를 크롤링하는 방법을 알지 못합니다. 여기에 우리의 코드는 특정 카테고리와 모든 제품을 얻기위한 것입니다 : 소매상 웹 사이트의 모든 제품을 긁어 모으기

import requests 
from bs4 import BeautifulSoup 
import json 
#import re 

params = {"action": "getcategory", 
      "br": "f21", 
      #"category": re.compile('\S+'), 
      "category": "dress", 
      "pageno": 1, 
      "pagesize": "", 
      "sort": "", 
      "fsize": "", 
      "fcolor": "", 
      "fprice": "", 
      "fattr": ""} 

url = "http://www.forever21.com/Ajax/Ajax_Category.aspx" 
js = requests.get(url, params=params).json() 
soup = BeautifulSoup(js[u'CategoryHTML'], "html.parser") 
i = 0 
j = 0 

while len(soup.select("div.item_pic a")) != 0: 
    for a in soup.select("div.item_pic a"): 
     #print a["href"] 
     i = i + 1 

    params["pageno"] = params["pageno"] + 1 
    j = j + 1 
    js = requests.get(url, params=params).json() 
    soup = BeautifulSoup(js[u'CategoryHTML'], "html.parser") 

print i 
print j 

당신이 코멘트에서 볼 수 있듯이

, 우리는 범주에 대한 정규 표현식을 사용하려고하지만 성공이 없었다. i와 j는 제품 및 페이지 카운터 일뿐입니다. 이 코드를 수정/추가하여 모든 제품 범주를 얻는 방법에 대한 제안 사항은 무엇입니까?

답변

1

당신은 카테고리 페이지를 긁어 탐색 메뉴에서 모든 하위 범주를 얻을 수 있습니다 :

import requests 
from bs4 import BeautifulSoup 


url = "http://www.forever21.com/Product/Category.aspx?br=f21&category=app-main" 
response = requests.get(url, headers={"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.71 Safari/537.36"}) 

soup = BeautifulSoup(response.content, "html.parser") 
menues = [li["class"][0] for li in soup.select("#has_sub .white nav ul > li")] 
print(menues) 

인쇄 :

[u'women-new-arrivals', u'want_list', u'dress', u'top_blouses', u'outerwear_coats-and-jackets', u'bottoms', u'intimates_loungewear', u'activewear', u'swimwear_all', u'acc', u'shoes', u'branded-shop-women-clothing', u'sale_women|women', u'women-new-arrivals-clothing-dresses', u'women-new-arrivals-clothing-tops', u'women-new-arrivals-clothing-outerwear', u'women-new-arrivals-clothing-bottoms', u'women-new-arrivals-clothing-intimates-loungewear', u'women-new-arrivals-clothing-swimwear', u'women-new-arrivals-clothing-activewear', u'women-new-arrivals-accessories|women-new-arrivals', u'women-new-arrivals-shoes|women-new-arrivals', u'promo-web-exclusives', u'promo-best-sellers-app', u'backinstock-women', u'promo-shop-by-outfit-women', u'occasion-shop-wedding', u'contemporary-main', u'promo-basics', u'21_items', u'promo-summer-forever', u'promo-coming-soon', u'dress_casual', u'dress_romper', u'dress_maxi', u'dress_midi', u'dress_mini', u'occasion-shop-dress', u'top_blouses-off-shoulder', u'top_blouses-lace-up', u'top_bodysuits-bustiers', u'top_graphic-tops', u'top_blouses-crop-top', u'top_t-shirts', u'sweater', u'top_blouses-sweatshirts-hoodies', u'top_blouses-shirts', u'top_plaids', u'outerwear_bomber-jackets', u'outerwear_blazers', u'outerwear_leather-suede', u'outerwear_jean-jackets', u'outerwear_lightweight', u'outerwear_utility-jackets', u'outerwear_trench-coats', u'outerwear_faux-fur', u'promo-jeans-refresh|bottoms', u'bottoms_pants', u'bottoms_skirt', u'bottoms_shorts', u'bottoms_shorts-active', u'bottoms_leggings', u'bottoms_sweatpants', u'bottom_jeans|', u'intimates_loungewear-bras', u'intimates_loungewear-panties', u'intimates_loungewear-bodysuits-slips', u'intimates_loungewear-seamless', u'intimates_loungewear-accessories', u'intimates_loungewear-sets', u'activewear_top', u'activewear_sports-bra', u'activewear_bottoms', u'activewear_accessories', u'swimwear_tops', u'swimwear_bottoms', u'swimwear_one-piece', u'swimwear_cover-ups', u'acc_features', u'acc_jewelry', u'acc_handbags', u'acc_glasses', u'acc_hat', u'acc_hair', u'acc_legwear', u'acc_scarf-gloves', u'acc_home-and-gift-items', u'shoes_features', u'shoes_boots', u'shoes_high-heels', u'shoes_sandalsflipflops', u'shoes_wedges', u'shoes_flats', u'shoes_oxfords-loafers', u'shoes_sneakers', u'Shoes_slippers', u'branded-shop-new-arrivals-women', u'branded-shop-women-clothing-dresses', u'branded-shop-women-clothing-tops', u'branded-shop-women-clothing-outerwear', u'branded-shop-women-clothing-bottoms', u'branded-shop-women-clothing-intimates', u'branded-shop-women-accessories|branded-shop-women-clothing', u'branded-shop-women-accessories-jewelry|', u'branded-shop-shoes-women|branded-shop-women-clothing', u'branded-shop-sale-women', u'/brandedshop/brandlist.aspx', u'promo-branded-boho-me', u'promo-branded-rare-london', u'promo-branded-selfie-leslie', u'sale-newly-added', u'sale_dresses', u'sale_tops', u'sale_outerwear', u'sale_sweaters', u'sale_bottoms', u'sale_intimates', u'sale_swimwear', u'sale_activewear', u'sale_acc', u'sale_shoes', u'the-outlet', u'sale-under-5', u'sale-under-10', u'sale-under-15'] 

brcategory GET 매개 변수의 값을. f21은 "여성"카테고리이고 app-main은 카테고리의 기본 페이지입니다.

+0

도움 주셔서 감사합니다. 명확히하기 위해, 이것은 br = f21 인 모든 카테고리 만 가져옵니다. 맞습니까? –

+0

@TerryRossi 예, f21 카테고리의 하위 카테고리. 기본 상점 페이지에서 최상위 레벨 범주를 다룰 수도 있습니다. – alecxe

관련 문제