2017-11-12 4 views
0

나는 여전히 아름다운 수프를 사용하는 복잡성을 배우고 있습니다.beautifulsoup 웹 스크랩 - 파이썬

http://www.nfl.com/injuries?week=1에서 플레이어의 이름, 위치 및 게임/부상 상태가있는 데이터 프레임을 만들려고합니다. 내가 찾은 코드를 적용하려고했지만 아무 것도 얻지 못하거나 어디서나 얻지 못했습니다. 어디에서 잘못 될지에 대한 제안이 있습니까?

편집 : 좀 더 살펴보고 나면 원래 문제는 태그 때문이었습니다. 그것의 모양은 <script>type=javascript/text입니다. 그래서 나는 그것을 바꿨다. 이제 점점 더 가까워지고 있지만 관련 데이터를 추출하는 방법을 모릅니다. {player : "", position : "".....} 데이터를 가져 오려면 어떻게해야합니까?

다음은 내가 수집하려고 시도한 샘플입니다.

import bs4 
import requests as re 
import pandas as pd  

alpha = re.get('http://www.nfl.com/injuries?week=1') 

beta = bs4.BeautifulSoup(alpha.text,'lxml') 
#print(beta) 

gama = beta.findAll('script', {'type':"text/javascript"}) 
print(gama) 

샘플

</script>, <script type="text/javascript"> 
nfl.use("node", "datatable", "datatable-sort", "mobile-panel", "overthrow", 
"overthrow-shadows", "tabview", function(Y) { 
var isTeamAway  = false, 
    isTeamHome  = false, 
    isTeam   = false, 
    homeAbbr  = 'DEN', 
    awayAbbr  = 'LAC', 
    gameWeek  = '1', 
    teamTabHome  = Y.one('.colors-DEN-1'), 
    teamTabAway  = Y.one('.colors-LAC-1'), 
    datatableHome = Y.one('.data-table-DEN-1'), 
    datatableAway = Y.one('.data-table-LAC-1'); 

var dataAway = [ 












    {player: "Inman Dontrelle ", position: "WR", injury: "Groin", practiceStatus: "Limited Participation in Practice", gameStatus: "Questionable", lastName: "Inman", firstName: "Dontrelle", esbId: "INM264861" }, 



    {player: "McGrath Sean ", position: "TE", injury: "Knee", practiceStatus: "Limited Participation in Practice", gameStatus: "Questionable", lastName: "McGrath", firstName: "Sean", esbId: "MCG631892" }, 











    {player: "Attaochu Jeremiah ", position: "DE", injury: "Hamstring", practiceStatus: "Limited Participation in Practice", gameStatus: "Questionable", lastName: "Attaochu", firstName: "Jeremiah", esbId: "ATT290361" }, 









    {player: "Boston Jayestin ", position: "S", injury: "Calf", practiceStatus: "Limited Participation in Practice", gameStatus: "Questionable", lastName: "Boston", firstName: "Jayestin", esbId: "BOS695248" }, 


]; 

var dataHome = [ 


    {player: "Booker Devontae ", position: "RB", injury: "Wrist", practiceStatus: "Did Not Participate In Practice", gameStatus: "Out", lastName: "Booker", firstName: "Devontae", esbId: "BOO019902" }, 



    {player: "Talib Aqib ", position: "CB", injury: "--", practiceStatus: "Full Participation in Practice", gameStatus: "--", lastName: "Talib", firstName: "Aqib", esbId: "TAL428789" }, 



    {player: "Paradis Matthew ", position: "C", injury: "--", practiceStatus: "Full Participation in Practice", gameStatus: "--", lastName: "Paradis", firstName: "Matthew", esbId: "PAR002722" }, 



    {player: "Kerr Zachariah ", position: "DT", injury: "Knee", practiceStatus: "Did Not Participate In Practice", gameStatus: "Out", lastName: "Kerr", firstName: "Zachariah", esbId: "KER593782" }, 



    {player: "Peko Kyle ", position: "DT", injury: "Foot", practiceStatus: "Limited Participation in Practice", gameStatus: "Questionable", lastName: "Peko", firstName: "Kyle", esbId: "PEK467819" }, 







    {player: "Dixon Riley ", position: "P", injury: "--", practiceStatus: "Full Participation in Practice", gameStatus: "--", lastName: "Dixon", firstName: "Riley", esbId: "DIX641722" }, 



    {player: "Crick Jared ", position: "DE", injury: "Back", practiceStatus: "Did Not Participate In Practice", gameStatus: "Out", lastName: "Crick", firstName: "Jared", esbId: "CRI129618" }, 



    {player: "Wolfe Derek ", position: "DE", injury: "--", practiceStatus: "Full Participation in Practice", gameStatus: "--", lastName: "Wolfe", firstName: "Derek", esbId: "WOL309455" }, 



    {player: "Lynch Paxton ", position: "QB", injury: "right Shoulder", practiceStatus: "Did Not Participate In Practice", gameStatus: "Out", lastName: "Lynch", firstName: "Paxton", esbId: "LYN526034" }, 





    {player: "Gotsis Adam ", position: "DE", injury: "--", practiceStatus: "Full Participation in Practice", gameStatus: "--", lastName: "Gotsis", firstName: "Adam", esbId: "GOT428790" }, 



    {player: "Thomas Demaryius ", position: "WR", injury: "--", practiceStatus: "Full Participation in Practice", gameStatus: "--", lastName: "Thomas", firstName: "Demaryius", esbId: "THO095855" }, 



    {player: "Charles Jamaal ", position: "RB", injury: "--", practiceStatus: "Full Participation in Practice", gameStatus: "--", lastName: "Charles", firstName: "Jamaal", esbId: "CHA561428" }, 




]; 

답변

1

은이 같은 정규 표현식 (정규식)를 사용할 수 있습니다 :

import bs4 
import requests 
import pandas as pd 
import re 

alpha = requests.get('http://www.nfl.com/injuries?week=1') 
beta = bs4.BeautifulSoup(alpha.text,'lxml') 
gama = beta.findAll('script', {'type':"text/javascript"}) 
for g in gama: 
    match = re.search(r'\{player(.*)',g.text) 
    if match: 
     print(match.group(0)) 

출력 : 내가 가진

{player: "Logan Bennie ", position: "DT", injury: "--", practiceStatus: "Full Participation in Practice", gameStatus: "--", lastName: "Logan", firstName: "Bennie", esbId: "LOG113260" }, 
{player: "Pelon Claudeson ", position: "DE", injury: "--", practiceStatus: "Full Participation in Practice", gameStatus: "--", lastName: "Pelon", firstName: "Claudeson", esbId: "PEL747520" }, 
{player: "Pasztor Austin ", position: "T", injury: "Chest", practiceStatus: "Limited Participation in Practice", gameStatus: "Questionable", lastName: "Pasztor", firstName: "Austin", esbId: "PAS822673" }, 
{player: "Flacco Joseph ", position: "QB", injury: "--", practiceStatus: "Full Participation in Practice", gameStatus: "--", lastName: "Flacco", firstName: "Joseph", esbId: "FLA009602" }, 
{player: "Dupree Alvin ", position: "LB", injury: "Shoulder", practiceStatus: "Did Not Participate In Practice", gameStatus: "Questionable", lastName: "Dupree", firstName: "Alvin", esbId: "DUP507860" }, 
{player: "Palmer Carson ", position: "QB", injury: "--", practiceStatus: "Full Participation in Practice", gameStatus: "--", lastName: "Palmer", firstName: "Carson", esbId: "PAL249055" }, 
{player: "Bortles Robby ", position: "QB", injury: "--", practiceStatus: "Full Participation in Practice", gameStatus: "--", lastName: "Bortles", firstName: "Robby", esbId: "BOR650964" }, 
{player: "Cooper Amari ", position: "WR", injury: "--", practiceStatus: "Full Participation in Practice", gameStatus: "--", lastName: "Cooper", firstName: "Amari", esbId: "COO487703" }, 
{player: "Goode Najee ", position: "LB", injury: "--", practiceStatus: "Full Participation in Practice", gameStatus: "--", lastName: "Goode", firstName: "Najee", esbId: "GOO217526" }, 
{player: "Rogers Chester ", position: "WR", injury: "Hamstring", practiceStatus: "Did Not Participate In Practice", gameStatus: "Out", lastName: "Rogers", firstName: "Chester", esbId: "ROG146742" }, 
{player: "Vannett Nicholas ", position: "TE", injury: "--", practiceStatus: "Full Participation in Practice", gameStatus: "--", lastName: "Vannett", firstName: "Nicholas", esbId: "VAN643509" }, 
{player: "Norris Jared ", position: "LB", injury: "Groin", practiceStatus: "Did Not Participate In Practice", gameStatus: "Out", lastName: "Norris", firstName: "Jared", esbId: "NOR463803" }, 
{player: "Apple Eli ", position: "CB", injury: "--", practiceStatus: "Full Participation in Practice", gameStatus: "--", lastName: "Apple", firstName: "Eli", esbId: "APP195645" }, 
{player: "Anthony Stephone ", position: "LB", injury: "Ankle", practiceStatus: "Limited Participation in Practice", gameStatus: "Questionable", lastName: "Anthony", firstName: "Stephone", esbId: "ANT204590" }, 
{player: "Inman Dontrelle ", position: "WR", injury: "Groin", practiceStatus: "Limited Participation in Practice", gameStatus: "Questionable", lastName: "Inman", firstName: "Dontrelle", esbId: "INM264861" }, 

주 나는 수입으로 재 요청을 다시 가져 오기로 변경합니다.

관련 문제