BeautifulSoup로 데이터를 추출하고 CSV로 출력

이전 질문에서 언급 한 것처럼 웹 사이트에서 날씨 데이터를 검색하기 위해 Python으로 아름다운 스프를 사용하고 있습니다. 여기 BeautifulSoup로 데이터를 추출하고 CSV로 출력

import requests 
from bs4 import BeautifulSoup 
import urllib3 

#getting the ValidTime 

r = requests.get('http://www.nea.gov.sg/api/WebAPI/? 
dataset=2hr_nowcast&keyref=781CF461BB6606AD907750DFD1D07667C6E7C5141804F45D') 
soup = BeautifulSoup(r.content, "xml") 
time = soup.find('validTime').string 
print "validTime: " + time 

#getting the date 

for currentdate in soup.find_all('item'): 
    element = currentdate.find('forecastIssue') 
    print "date: " + element['date'] 

#getting the time 

for currentdate in soup.find_all('item'): 
    element = currentdate.find('forecastIssue') 
    print "time: " + element['time'] 

for area in soup.find('weatherForecast').find_all('area'): 
    area_attrs_li = [area.attrs for area in soup.find('weatherForecast').find_all('area')] 
    print area_attrs_li

내 결과입니다 :

{'lat': u'1.34039000', 'lon': u'103.70500000', 'name': u'Jurong West', 
'forecast': u'LR'}, {'lat': u'1.31200000', 'lon': u'103.86200000', 'name': 
u'Kallang', 'forecast': u'LR'},

<channel> 
<title>2 Hour Forecast</title> 
<source>Meteorological Services Singapore</source> 
<description>2 Hour Forecast</description> 
<item> 
<title>Nowcast Table</title> 
<category>Singapore Weather Conditions</category> 
<forecastIssue date="18-07-2016" time="03:30 PM"/> 
<validTime>3.30 pm to 5.30 pm</validTime> 
<weatherForecast> 
<area forecast="TL" lat="1.37500000" lon="103.83900000" name="Ang Mo Kio"/> 
<area forecast="SH" lat="1.32100000" lon="103.92400000" name="Bedok"/> 
<area forecast="TL" lat="1.35077200" lon="103.83900000" name="Bishan"/> 
<area forecast="CL" lat="1.30400000" lon="103.70100000" name="Boon Lay"/> 
<area forecast="CL" lat="1.35300000" lon="103.75400000" name="Bukit Batok"/> 
<area forecast="CL" lat="1.27700000" lon="103.81900000" name="Bukit Merah"/>` 
<channel>

: 여기

처럼 웹 사이트가 보이는 방법

결과에서 u '를 어떻게 제거합니까? 검색하는 동안 찾은 방법을 사용하여 시도했지만 작동하지 않는 것 같습니다.

저는 파이썬에 강하지 않으며 꽤 오랫동안 여기에 머물러 있습니다.

편집

: 나는이 일을 시도 :

f = open("C:\\scripts\\nea.csv" , 'wt') 

try: 
for area in area_attrs_li: 
writer = csv.writer(f) 
writer.writerow((time, element['date'], element['time'], area_attrs_li)) 

finally: 
    f.close() 

print open("C:/scripts/nea.csv", 'rt').read()

그것은 그러나 근무 기록이 CSV에서 중복되어 나는 떨어져 지역을 분할하고 싶습니다 :

가 고맙습니다.

출처

2016-07-26 plzhelpmi

귀하의 "웹 사이트는"일반 XML –

처럼 보인다 예, 저는 그저 평범한 XML이라고 생각합니다. – plzhelpmi

2 번 질문에는 u를 제거 할 필요가 없습니다. 유니 코드 (unicode)의 약자로, 파이썬이 파일에 쓸 때가 아니라 내부적으로 문자열을 나타내는 방식입니다.어떤 문제를 설명해야합니까 –

편집 1 -Topic :

당신이 놓치고있는 이스케이프 문자 :

C:\scripts>python neaweather.py 
File "neaweather.py", line 30 
writer.writerow(('time', 'element['date']', 'element['time']', 'area_attrs_li')) 

writer.writerow(('time', 'element[\'date\']', 'element[\'time\']', 'area_attrs_li') 
           ^

구문 에러 : 유효하지 않은 구문

편집 2 :

당신이 값을 삽입하려면 :

writer.writerow((time, element['date'], element['time'], area_attrs_li))

편집 3 :

for area in area_attrs_li: 
    writer.writerow((time, element['date'], element['time'], area)

편집 : 4 :

다른 라인에 결과를 분할하는 분할이 전혀 정확하지 을하지만, 그것은 구문 분석하는 방법의 이해 및 분할 데이터를하여야한다 필요에 따라 변경하십시오. 당신이 당신의 이미지에 표시로 다시 영역 요소를 분할, 당신이

for area in area_attrs_li: 
    # cut off the characters you don't need 
    area = area.replace('[','') 
    area = area.replace(']','') 
    area = area.replace('{','') 
    area = area.replace('}','') 

    # remove other characters 
    area = area.replace("u'","\"").replace("'","\"") 

    # split the string into a list 
    areaList = area.split(",") 

    # create your own csv-seperator 
    ownRowElement = ';'.join(areaList) 

    writer.writerow((time, element['date'], element['time'], ownRowElement)

논외 구문 분석 할 수 있습니다 : 를 이것은 나를 위해 작동 :

는

import csv 
import json 

x="""[ 
    {'lat': u'1.34039000', 'lon': u'103.70500000', 'name': u'Jurong West','forecast': u'LR'} 
]""" 

jsontxt = json.loads(x.replace("u'","\"").replace("'","\"")) 

f = csv.writer(open("test.csv", "w+")) 

# Write CSV Header, If you dont need that, remove this line 
f.writerow(['lat', 'lon', 'name', 'forecast']) 

for jsontext in jsontxt: 
    f.writerow([jsontext["lat"], 
       jsontext["lon"], 
       jsontext["name"], 
       jsontext["forecast"], 
       ])

출처

2016-07-26 07:46:14 user2853437

질문은 JSON이 아니라 XML에 대한 것입니다. –

안녕하세요, 다른 열로 분할하여 작동합니다 :) 웹 사이트를 실행하고 싶다면 코드를 어떻게 편집합니까? :) – plzhelpmi

동의. 질문은 JSON이 아닌 파이썬 사전이 있습니다 –

BeautifulSoup로 데이터를 추출하고 CSV로 출력

답변

관련 문제