html
  • parsing
  • selenium
  • beautifulsoup
  • bs4
  • 2016-06-14 4 views 2 likes 
    2

    얘들 아 아마존의 주요 검색 창에는 다음과 같은 정보를(구문 분석)

    <input type="submit" class="nav-input" value="Go" tabindex="7"> 
    

    을 가지고 있는데이 기능을 만드는 생각했다 그래서 이 태그를 발견하고 내가

    이 할 줄 특정 키워드를 검색 할 살아야이 :

    MYT 생각이
    path = 'https://www.amazon.co.uk/' 
    
    values = {'s': 'what-I-want-to-search', 
          'submit':'search'} 
    data = urllib.parse.urlencode(values) 
    data = data.encode('utf-8') 
    
    
    driver = webdriver.PhantomJS() 
    driver.get(path) 
    html = driver.page_source 
    
    driver = webdriver.PhantomJS() 
    driver.get(path, data) 
    
    html = driver.page_source 
    

    하는 sentdex TUTO 다음 rial, 나는 검색 조건을 인코딩 한 다음 html 경로로 보내고, ive는 동적으로로드 된 웹 페이지와 싸우기 위해 셀레늄을 사용하고 있지만이 경우에는 괜찮을 것이라고 생각하지만 어느 쪽이든, 파이썬을 얻는 방법을 알아야합니다. 기본 사이트에서 무언가를 검색하고 검색 결과 페이지로 이동하게하려면 어떤 도움이 필요합니까?

    카르마가

    답변

    1

    요청BS4를 사용하여 당신의 방법 친구 올 것이다, 당신은 당신이 크롬 개발 도구에서 네트워크 탭을 보면 당신이 볼 수있는 올바른 PARAMS 전달해야합니다

    enter image description here

    In [4]: from bs4 import BeautifulSoup  
    In [5]: import requests  
    In [6]: params = {"url": "search-alias=", 
        ...:   "field-keywords": "python"} 
    
    In [7]: with requests.Session() as s: 
        ...:   url = "https://www.amazon.co.uk/s" 
        ...:   r = s.get(url, params=params) 
        ...:   soup = BeautifulSoup(r.content,"lxml") 
        ...:   for a in cont: 
        ...:    print(a.select_one("a")["title"]) 
        ...:   
    Python Programming for the Absolute Beginner 
    Python: The Ultimate Beginner's Guide! 
    Automate the Boring Stuff with Python: Practical Programming for Total Beginners 
    Python: Learn Python in One Day and Learn It Well. Python for Beginners with Hands-on Project. (Learn Coding Fast with Hands-On Project Book 1) 
    Python Crash Course: A Hands-On, Project-Based Introduction to Programming 
    Learning Python 
    Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython 
    Python Cookbook 
    Python for Informatics: Exploring Information 
    Fluent Python 
    Python Playground: Geeky Projects for the Curious Programmer 
    Python in easy steps 
    Learn Python the Hard Way: A Very Simple Introduction to the Terrifyingly Beautiful World of Computers and Code (Zed Shaw's Hard Way) 
    Python: The Ultimate Beginners Guide: Start Coding Today 
    Programming the Raspberry Pi, Second Edition: Getting Started with Python 
    Data Science from Scratch: First Principles with Python 
    

    기능에 코드를 깨는 모든 t 얻을 아이디 pagnNextLink와 앵커가 보이지 않을 때까지 그 페이지는 우리는 단지 루프 유지해야합니다

    from bs4 import BeautifulSoup 
    import requests 
    from urlparse import urljoin 
    # from urllib.parse import urljoin -> python 3 
    
    
    def parse(soup): 
        yield [a["title"] for a in soup.select("a.a-link-normal.s-access-detail-page.a-text-normal")] 
    
    def get(term): 
        params = {"url": "search-alias=", 
           "field-keywords": term} 
    
        with requests.Session() as s: 
         head = {"User-Agent":"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.75 Safari/537.36"} 
         url = "https://www.amazon.co.uk/s" 
         r = s.get(url, params=params) 
         soup = BeautifulSoup(r.content, "lxml") 
         nxt = soup.select_one("#pagnNextLink") 
         while nxt: 
          cont = requests.get(urljoin("https://www.amazon.co.uk/", nxt["href"]), headers=head) 
          soup = BeautifulSoup(cont.content,"lxml") 
          for t in parse(soup): 
           print(t) 
          nxt = soup.select_one("#pagnNextLink") 
    

    우리는 반복의 몇 가지의 코드를 실행하면 :

    In [5]: get("python") 
    ['Python Machine Learning', 'Effective Python: 59 Specific Ways to Write Better Python (Effective Software Development)', 'Black Hat Python: Python Programming for Hackers and Pentesters', 'Doing Math with Python: Use Programming to Explore Algebra, Statistics, Calculus, and More!', 'Think Python: How to Think Like a Computer Scientist', 'Python Basics, Level 1 (Coding Club) (Coding Club, Level 1)', 'Python for Finance: Analyze Big Financial Data', 'Violent Python: A Cookbook for Hackers, Forensic Analysts, Penetration Testers and Security Engineers', "Python Essential Reference (Developer's Library)", 'Learn Web Scraping With Python In A Day: The Ultimate Crash Course to Learning the Basics of Web Scraping With Python In No Time (Python, Python ... Python Books, Python for Beginners)', 'Programming Python', 'QPython - Python on Android', 'Coding Club Python: Next Steps Level 2', "Python: Programming, Master's Handbook; A TRUE Beginner's Guide! Problem Solving, Code, Data Science, Data Structures & Algorithms (Code like a PRO ... engineering, r programming, iOS development)", 'Python: Complete Crash Course for Becoming an Expert in Python Programming', 'Coding Club Python: Building Big Apps Level 3'] 
    ['High Performance Python: Practical Performant Programming for Humans', '25ft Python No Spill Clean And Fill', 'Learning Python with Raspberry Pi', 'Web Scraping with Python: Collecting Data from the Modern Web', 'Invent Your Own Computer Games with Python, 3rd Edition', 'More Python Programming for the Absolute Beginner', 'Python for Kids: A Playful Introduction to Programming', "Monty Python's Life of Brian", 'Python 3 Object-oriented Programming - Second Edition', 'Introduction to Computation and Programming Using Python', 'Evolution of The Silly Walks T Shirt - Funny TV Ministry - Various Colours and Sizes XS - 3XL', "Hacking Secret Ciphers with Python: A beginner's guide to cryptography and computer programming with Python", 'Monty Python Fluxx', 'MASTER LOCK 8417DPRO Python Cable 1.80 m x 5 mm 2 Keys', "Learn Python: A beginner's guide book to programming python, learning the basics and start coding easily", 'Master Lock Python Disc Cylinder Key Adjustable Braided Steel Cable Lock, 10 x 1800 mm - Black'] 
    
    In [6]: get("c programming") 
    ['C Programming', 'C# 6.0 in a Nutshell: The Definitive Reference', 'PIC microcontrollers Programming in C with examples', 'C++: The Ultimate Crash Course to Learning the Basics of C++ In No Time (c plus plus, C++ for beginners, programming computer, how to program) (HTML, Javascript, ... Java, C++ Course, C++ Development Book 3)', 'Java: The Best Guide to Master Java Programming Fast (Java for Beginners, Java for Dummies, how to program, java app, java programming): Volume 2 (C Programming, HTML, Javascript)', 'A Book on C.: Programming in C.', "Learn C the Hard Way: Practical Exercises on the Computational Subjects You Keep Avoiding (Like C) (Zed Shaw's Hard Way Series)", 'C++: C++ and Hacking for dummies. A smart way to learn C plus plus and beginners guide to computer hacking: Volume 10 (C Programming, HTML, Javascript, Programming, Coding, CSS, Java, PHP)', 'Introduction to Algorithms', 'Programming: Computer Programming for Beginners: Learn the Basics of Java, SQL & C++ - 2. Edition (Coding, C Programming, Java Programming, SQL Programming, JavaScript, Python, PHP)', '21st Century C: C Tips from the New School', 'C For Dummies', 'Learn C# Programming Training DVD - Tutorial Video', 'GT01-C30R2-6P Programming PLC Cable 2.5M for Mitsubishi Melsec A970', 'Programming In C', 'Get Coding!: Learn HTML, CSS & JavaScript & build a website, app & game'] 
    ['Hewlett Packard [HP] Calculator Financial Platinum RPN Algebraic Programmable Ref HP12C PLATINUM', 'C: Easy C Programming for Beginners, Your Step-By-Step Guide To Learning C Programming (C Programming Series)', '4.9M RS232 DB9 F/M PLC Programming Cable Adapter White for Omron CQM1 C200HE HG', 'KOREAN COSMETICS, LG Household & Health Care_ SUM37, Secret Programming Eye C...', 'C++: C++ and Python. C++ for Beginners and Python for Dummies to Learn Fast (C Programming, Programming for beginners, c plus plus, programming ... Developers, Coding, CSS, Java, PHP)', '1:8 Brushless Combo BLC-150C Plus + Ripper 2000KV motor + programming Board', 'Lonely Planet Italian Phrasebook & Audio', 'Full Forgiveness - Let Go of Hurt & Offense With Guided Imagery, Self Hypnosis and Neuro-linguistic Programming (NLP)', 'Accelerated C++: Practical Programming by Example (C++ in Depth Series)', 'Gardena Water Computer C1060plus 1864-20', 'Learning To Build Apps For iPhone and iPad - Training DVD', 'Practical C Programming (A Nutshell handbook)', 'Prince Brat and the Whipping Boy', 'English: Practice Test Papers (Letts Key Stage 2 Success) (Letts Key Stage 1 Success)', 'Arabic For Dummies: Audio Set', 'The Actor and the Text (Applause Acting Series)'] 
    

    당신이 할 수있는 무엇 당신은 구문 분석을 좋아합니다, 나는 우리가 올바른 데이터를 얻고 있는지 쉽게 알 수 있도록 제목을 가져 왔습니다. 나는 또한 요청 사이에 잠을 추가하는 것을 고려할 것이다.

    +0

    개봉 된 사람, 놀라운 사람 감사합니다. – entercaspa

    관련 문제