2016-06-07 2 views
0

을 얻기 위해하는 것은 올바른 JSON : 아니다파이썬 문자열을 JSON과 나는 것 하나 JSON 문자열을 가지고 HTML

내가 "HTML"값을 얻을하고 BeautifulSoup로 그것을 처리 할 필요가
{"d":{"__type":"NGW.WebClient.AjaxMessages.GVGameHtmlResponse","res":0,"html":"\u003cdiv id=\"gvGameFixed\" class=\"Hockey\" leagueid=\"4\" brmatchid=\"0\"\u003e\r\n\t\r\n \u003cdiv class=\"gameHead\"\u003e\r\n  \u003cdiv class=\"section\"\u003e\r\n   \u003cdiv class=\"subtitle\"\u003eHockey - NHL\u003c/div\u003e\r\n\t\t\t\u003cdiv class=\"desc\"\u003eHp Pavillion At San Jose\u003c/div\u003e\r\n  \u003cdiv class=\"title\"\u003ePit Penguins vs SJ Sharks\u003c/div\u003e\r\n  \u003c/div\u003e\r\n  \u003cdiv class=\"nav\"\u003e\r\n   \u003cbutton id=\"btnMyBets\" type=\"button\" class=\"btnMyBets\" onclick=\"loadMyWagersFrameOnGame(70892);\"\u003eMy Bets on This Game\u003c/button\u003e\r\n  \u003c/div\u003e\r\n \u003c/div\u003e\r\n\r\n \r\n \r\n\r\n\u003c/div\u003e\r\n\r\n\u003cdiv id=\"gvPropContainer\" class=\"scrollInner\"\u003e\r\n \u003cdiv id=\"gvGameNoProps\"\u003e\r\n  This event has no active propositions\r\n \u003c/div\u003e\r\n \r\n \r\n\r\n\u003cdiv class=\"gvProp\" pid=\u00272736341\u0027 order=\u002710\u0027\u003e\r\n \u003cdiv class=\"propTitle\"\u003e\u003cspan\u003eGame Winner\u003c/span\u003e\u003c/div\u003e \r\n \u003cul class=\u0027oneUp\u0027\u003e\r\n  \r\n  \r\n\r\n\r\n\u003cli onmouseover=\u0027mouseOver(this);\u0027 onmouseout=\u0027mouseOut(this);\u0027 onclick=\u0027betSlipAdd(event||window.event, this);\u0027 class=\u0027\u0027 pid=\u00272736341\u0027 pos=\u00271\u0027 odds=\u00271.3704\u0027 pts=\u00271.5\u0027\u003e\r\n\r\n\t\u003cdiv class=\"box\"\u003e\r\n\t \u003cdiv class=\"propText\"\u003epit penguins +1.5\u003c/div\u003e\r\n\t \u003cdiv class=\"odds\"\u003e\r\n\t \t−270\r\n   \u003cimg alt=\"flat\" src=\"Skin/Pinoccio/Images/odds_flat.png?v=4.5.4.81\"/\u003e\r\n\t \u003c/div\u003e\r\n\t \u003cdiv class=\u0027selStatus\u0027\u003e\u003cspan\u003e\u003c/span\u003e\u003c/div\u003e\r\n \u003c/div\u003e\r\n\u003c/li\u003e\r\n\r\n\r\n\r\n\u003cli onmouseover=\u0027mouseOver(this);\u0027 onmouseout=\u0027mouseOut(this);\u0027 onclick=\u0027betSlipAdd(event||window.event, this);\u0027 class=\u0027\u0027 pid=\u00272736341\u0027 pos=\u00272\u0027 odds=\u00273.21\u0027 pts=\u00271.5\u0027\u003e\r\n\r\n\t\u003cdiv class=\"box\"\u003e\r\n\t \u003cdiv class=\"propText\"\u003esj sharks −1.5\u003c/div\u003e\r\n\t \u003cdiv class=\"odds\"\u003e\r\n\t \t+221\r\n   \u003cimg alt=\"flat\" src=\"Skin/Pinoccio/Images/odds_flat.png?v=4.5.4.81\"/\u003e\r\n\t \u003c/div\u003e\r\n\t \u003cdiv class=\u0027selStatus\u0027\u003e\u003cspan\u003e\u003c/span\u003e\u003c/div\u003e\r\n \u003c/div\u003e\r\n\u003c/li\u003e\r\n\r\n \u003c/ul\u003e\r\n\u003c/div\u003e\r\n\r\n\r\n\u003cdiv class=\"gvProp\" pid=\u00272736342\u0027 order=\u002720\u0027\u003e\r\n \u003cdiv class=\"propTitle\"\u003e\u003cspan\u003eGame Total - Incl OT/Pen\u003c/span\u003e\u003c/div\u003e \r\n \u003cul class=\u0027oneUp\u0027\u003e\r\n  \r\n  \r\n\r\n\r\n\u003cli onmouseover=\u0027mouseOver(this);\u0027 onmouseout=\u0027mouseOut(this);\u0027 onclick=\u0027betSlipAdd(event||window.event, this);\u0027 class=\u0027\u0027 pid=\u00272736342\u0027 pos=\u00271\u0027 odds=\u00272.39\u0027 pts=\u00275.5\u0027\u003e\r\n\r\n\t\u003cdiv class=\"box\"\u003e\r\n\t \u003cdiv class=\"propText\"\u003eover 5.5\u003c/div\u003e\r\n\t \u003cdiv class=\"odds\"\u003e\r\n\t \t+139\r\n   \u003cimg alt=\"flat\" src=\"Skin/Pinoccio/Images/odds_flat.png?v=4.5.4.81\"/\u003e\r\n\t \u003c/div\u003e\r\n\t \u003cdiv class=\u0027selStatus\u0027\u003e\u003cspan\u003e\u003c/span\u003e\u003c/div\u003e\r\n \u003c/div\u003e\r\n\u003c/li\u003e\r\n\r\n\r\n\r\n\u003cli onmouseover=\u0027mouseOver(this);\u0027 onmouseout=\u0027mouseOut(this);\u0027 onclick=\u0027betSlipAdd(event||window.event, this);\u0027 class=\u0027\u0027 pid=\u00272736342\u0027 pos=\u00272\u0027 odds=\u00271.6061\u0027 pts=\u00275.5\u0027\u003e\r\n\r\n\t\u003cdiv class=\"box\"\u003e\r\n\t \u003cdiv class=\"propText\"\u003eunder 5.5\u003c/div\u003e\r\n\t \u003cdiv class=\"odds\"\u003e\r\n\t \t−165\r\n   \u003cimg alt=\"flat\" src=\"Skin/Pinoccio/Images/odds_flat.png?v=4.5.4.81\"/\u003e\r\n\t \u003c/div\u003e\r\n\t \u003cdiv class=\u0027selStatus\u0027\u003e\u003cspan\u003e\u003c/span\u003e\u003c/div\u003e\r\n \u003c/div\u003e\r\n\u003c/li\u003e\r\n\r\n \u003c/ul\u003e\r\n\u003c/div\u003e\r\n\r\n\r\n\u003cdiv class=\"gvProp\" pid=\u00272736343\u0027 order=\u002730\u0027\u003e\r\n \u003cdiv class=\"propTitle\"\u003e\u003cspan\u003eGame Winner ML - Incl OT/Pen\u003c/span\u003e\u003c/div\u003e \r\n \u003cul class=\u0027oneUp\u0027\u003e\r\n  \r\n  \r\n\r\n\r\n\u003cli onmouseover=\u0027mouseOver(this);\u0027 onmouseout=\u0027mouseOut(this);\u0027 onclick=\u0027betSlipAdd(event||window.event, this);\u0027 class=\u0027\u0027 pid=\u00272736343\u0027 pos=\u00271\u0027 odds=\u00272.16\u0027 pts=\u00270\u0027\u003e\r\n\r\n\t\u003cdiv class=\"box\"\u003e\r\n\t \u003cdiv class=\"propText\"\u003epit penguins\u003c/div\u003e\r\n\t \u003cdiv class=\"odds\"\u003e\r\n\t \t+116\r\n   \u003cimg alt=\"flat\" src=\"Skin/Pinoccio/Images/odds_flat.png?v=4.5.4.81\"/\u003e\r\n\t \u003c/div\u003e\r\n\t \u003cdiv class=\u0027selStatus\u0027\u003e\u003cspan\u003e\u003c/span\u003e\u003c/div\u003e\r\n \u003c/div\u003e\r\n\u003c/li\u003e\r\n\r\n\r\n\r\n\u003cli onmouseover=\u0027mouseOver(this);\u0027 onmouseout=\u0027mouseOut(this);\u0027 onclick=\u0027betSlipAdd(event||window.event, this);\u0027 class=\u0027\u0027 pid=\u00272736343\u0027 pos=\u00272\u0027 odds=\u00271.7299\u0027 pts=\u00270\u0027\u003e\r\n\r\n\t\u003cdiv class=\"box\"\u003e\r\n\t \u003cdiv class=\"propText\"\u003esj sharks\u003c/div\u003e\r\n\t \u003cdiv class=\"odds\"\u003e\r\n\t \t−137\r\n   \u003cimg alt=\"flat\" src=\"Skin/Pinoccio/Images/odds_flat.png?v=4.5.4.81\"/\u003e\r\n\t \u003c/div\u003e\r\n\t \u003cdiv class=\u0027selStatus\u0027\u003e\u003cspan\u003e\u003c/span\u003e\u003c/div\u003e\r\n \u003c/div\u003e\r\n\u003c/li\u003e\r\n\r\n \u003c/ul\u003e\r\n\u003c/div\u003e\r\n\r\n\u003c/div\u003e\r\n\r\n","gameID":70892,"maxPropStamp":1465233306663,"progStamp":1464871557570,"msgsHtml":"\r\n\r\n\u003cdiv id=\"eventMessages\"\u003e\r\n \u003cul id=\"eventMessagesContent\"\u003e\r\n  \r\n  \r\n \u003c/ul\u003e\r\n \u003cdiv class=\"viewMoreBtn collapsed\"\u003e\r\n  \u003cinput type=\"hidden\" id=\"strViewMoreMessages\" value=\"Show messages\"/\u003e\r\n  \u003cinput type=\"hidden\" id=\"strHideMessages\" value=\"Hide messages\"/\u003e\r\n  \u003cp\u003e\r\n   Show messages\r\n  \u003c/p\u003e\r\n \u003c/div\u003e\r\n\u003c/div\u003e","maxMessageStamp":1465233485217}} 

.

문제는 다음과 같습니다 1. 왜 내가이 TI JSON 변환 할 수 없습니다 2. 가장 큰 문제는 순수 HTML이 유니 코드 문자열을 변환 할 수있는 전자이다 (어쨌든 .. 내가 너무 정규식 얻을 수 있습니다)있는 내가해야 bs4로 프로세스. 도울 수 있니. 이 문자열을 가져 와서 BeautifulSoup로 처리하려면 어떻게해야합니까?

감사합니다.

+0

그것은 적어도 JSONLint.com에 따라 유효 JSON이다. 왜 그렇게 생각하지 않는거야? 나는'json.loads()'를 사용하여 잘로드 할 수 있었고,'data [ 'd'] [ 'html']'로 HTML에 접근했다. –

+0

json.loads()로 시도해보십시오 ... 나는 많은 시간을 보냈습니다. json, simplejson 등등 ... – simopopov

+0

다음에 어떤 오류가 발생했는지 보여줍니다. 여기에 게시 된 JSON은 문제가 없습니다. 처음에 어떻게 데이터를 얻었습니까? –

답변

0

이것은 (랬 test.json이 당신의 데이터를 포함) 제대로 데이터를 읽을 관리 :

#!/bin/python 

import json 
import bs4 

with open('test.json') as file_: 
    json_data = json.load(file_) 

soup = bs4.BeautifulSoup(json_data['d']['html'], 'html.parser') 

print(soup) 
+0

예, 이미 주석을 달았으므로 JSON이 유효합니다. 그것을로드하는 특별한 트릭은 없습니다. –

관련 문제