2017-04-13 1 views
1

다음 코드를 사용하여 목록을 살펴보고 정보를 추출하여 새 목록에 넣습니다.Python의 목록에서 BeautifulSoup 태그를 제거하십시오.

0을 찾으면 0이 추가됩니다. '없음'이 있으면 0이 추가됩니다. 목록 요소의 세 번째 종류는 BeautifulSoup에서 추출한 태그입니다. 내가 할 수 있기를 원하는 것은

내가 태그의 정보가 방해 받고있는 regex 함께 일하고 있어요 주어진 그러나, 태그 내부 에서 몇 가지 정보를 추출하고 newList에 그것을 추가입니다 .

내가 가진 코드는 여기에 주어집니다 :

list = ['<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=826">11 votes for, 1 vote against, 15 absences, between 1999&ndash;2014</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=811">8 votes for, 1 vote against, 3 absences, between 1999&ndash;2015</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=1050">4 votes for, 0 votes against, 3 absences, between 2002&ndash;2004</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6686">4 votes for, 1 vote against, 2 absences, between 2004&ndash;2014</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6703">5 votes for, 0 votes against, 4 absences, between 2011&ndash;2016</a>', 'None', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6688">3 votes for, 7 votes against, 1 absence, between 2002&ndash;2015</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=1049">0 votes for, 6 votes against, between 2002&ndash;2003</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=975">1 vote for, 1 vote against, 13 absences, between 2006&ndash;2016</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=984">0 votes for, 4 votes against, 3 absences, between 2007&ndash;2016</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=1065">45 votes for, 12 votes against, 32 absences, between 2007&ndash;2017</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=1027">2 votes for, 3 votes against, 8 absences, between 2011&ndash;2016</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6706">3 votes for, 1 vote against, between 2010&ndash;2012</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6764">5 votes for, 3 votes against, 4 absences, between 2016&ndash;2017</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6761">4 votes for, 4 votes against, 5 absences, between 2016&ndash;2017</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6757">0 votes for, 3 votes against, between 2014&ndash;2015</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6672">0 votes for, 13 votes against, 4 absences, between 2012&ndash;2014</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6674">5 votes for, 0 votes against, in 2013</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6673">13 votes for, 0 votes against, 2 absences, between 2011&ndash;2016</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6684">0 votes for, 3 votes against, 1 absence, in 2012</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6674">5 votes for, 0 votes against, in 2013</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6702">8 votes for, 0 votes against, 1 absence, between 2011&ndash;2014</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6680">0 votes for, 21 votes against, 4 absences, between 2011&ndash;2016</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=1110">3 votes for, 18 votes against, 5 absences, between 2010&ndash;2015</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6694">5 votes for, 10 votes against, 4 absences, between 2010&ndash;2015</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6699">0 votes for, 3 votes against, 6 absences, between 2012&ndash;2014</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6693">6 votes for, 6 votes against, 4 absences, between 2010&ndash;2013</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6681">10 votes for, 0 votes against, 2 absences, between 2012&ndash;2015</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=1109">1 vote for, 3 votes against, 1 absence, between 2004&ndash;2011</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=1109">1 vote for, 3 votes against, 1 absence, between 2004&ndash;2011</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6685">17 votes for, 1 vote against, between 2011&ndash;2015</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6733">2 votes for, 6 votes against, 2 absences, between 2011&ndash;2015</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6711">2 votes for, 0 votes against, 2 absences, in 2013</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6716">0 votes for, 5 votes against, between 2012&ndash;2013</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6731">0 votes for, 12 votes against, between 2008&ndash;2017</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6756">0 votes for, 4 votes against, 1 absence, between 2015&ndash;2016</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6679">1 vote for, 21 votes against, 4 absences, between 2010&ndash;2016</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6690">5 votes for, 3 votes against, between 2013&ndash;2016</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6691">7 votes for, 7 votes against, between 2010&ndash;2014</a>', 'None', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6677">7 votes for, 0 votes against, between 2011&ndash;2012</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6676">0 votes for, 7 votes against, between 2011&ndash;2012</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=363">0 votes for, 4 votes against, 1 absence, in 2003</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=811">8 votes for, 1 vote against, 3 absences, between 1999&ndash;2015</a>', 'None', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=1074">2 votes for, 14 votes against, 16 absences, between 1998&ndash;2014</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=1132">0 votes for, 1 vote against, in 2010</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6687">0 votes for, 9 votes against, 2 absences, between 2010&ndash;2016</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6682">0 votes for, 2 votes against, in 2011</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=1052">4 votes for, 6 votes against, 5 absences, between 1997&ndash;2016</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6671">0 votes for, 4 votes against, 2 absences, between 2010&ndash;2017</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=1113">0 votes for, 11 votes against, between 2011&ndash;2016</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=1136">0 votes for, 6 votes against, 2 absences, between 2010&ndash;2016</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=996">2 votes for, 0 votes against, 8 absences, between 2007&ndash;2009</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=1084">1 vote for, 1 vote against, 4 absences, between 2010&ndash;2016</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=837">10 votes for, 0 votes against, 4 absences, between 2003&ndash;2016</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6683">0 votes for, 4 votes against, 1 absence, between 2012&ndash;2013</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6678">0 votes for, 12 votes against, between 2013&ndash;2016</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6698">2 votes for, 2 votes against, 1 absence, between 2010&ndash;2014</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=1079">5 votes for, 1 vote against, 5 absences, between 1999&ndash;2016</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6708">2 votes for, 1 vote against, 16 absences, between 2012&ndash;2017</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6709">8 votes for, 5 votes against, 20 absences, between 2011&ndash;2015</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6695">23 votes for, 12 votes against, 14 absences, between 2011&ndash;2016</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6736">0 votes for, 3 votes against, in 2015</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=842">3 votes for, 1 vote against, 3 absences, between 2004&ndash;2016</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=1087">3 votes for, 13 votes against, 12 absences, between 2002&ndash;2016</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=1071">2 votes for, 1 vote against, 2 absences, between 2008&ndash;2009</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=1051">6 votes for, 6 votes against, 12 absences, between 2005&ndash;2006</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6696">0 votes for, 7 votes against, 1 absence, between 2011&ndash;2012</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6721">0 votes for, 5 votes against, 3 absences, between 2014&ndash;2016</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6734">0 votes for, 7 votes against, 2 absences, between 2015&ndash;2016</a>', 'None', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6758">0 votes for, 2 votes against, 1 absence, in 2016</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=1030">19 votes for, 6 votes against, 6 absences, between 2000&ndash;2016</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6693">6 votes for, 6 votes against, 4 absences, between 2010&ndash;2013</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6697">0 votes for, 2 votes against, in 2011</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6699">0 votes for, 3 votes against, 6 absences, between 2012&ndash;2014</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6704">4 votes for, 1 vote against, between 2011&ndash;2013</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6710">0 votes for, 3 votes against, 1 absence, between 2012&ndash;2014</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6741">2 votes for, 1 vote against, 1 absence, in 2015</a>', 'None', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6747">2 votes for, 0 votes against, 1 absence, in 2016</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6692">4 votes for, 0 votes against, 1 absence, in 2013</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6693">6 votes for, 6 votes against, 4 absences, between 2010&ndash;2013</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6699">0 votes for, 3 votes against, 6 absences, between 2012&ndash;2014</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6746">2 votes for, 0 votes against, 2 absences, in 2016</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6744">0 votes for, 5 votes against, between 2015&ndash;2016</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6743">0 votes for, 5 votes against, between 2015&ndash;2016</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=810">7 votes for, 5 votes against, 3 absences, between 2004&ndash;2014</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=1120">0 votes for, 3 votes against, 2 absences, in 2010</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=1053">13 votes for, 30 votes against, 27 absences, between 2001&ndash;2010</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=1105">0 votes for, 3 votes against, 2 absences, between 2009&ndash;2011</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6705">2 votes for, 0 votes against, 2 absences, between 2013&ndash;2016</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6707">1 vote for, 7 votes against, 4 absences, between 2011&ndash;2014</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6715">0 votes for, 5 votes against, 2 absences, in 2013</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6720">2 votes for, 3 votes against, in 2013</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6719">0 votes for, 4 votes against, 2 absences, between 2012&ndash;2013</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6718">4 votes for, 0 votes against, in 2014</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6667">9 votes for, 57 votes against, 15 absences, between 2011&ndash;2015</a>'] 

newList = [] 
digitReg = r"\d+" 
for thing in list: 
aggregate = 0 
    if thing == '0': 
     newList.append(0) 
    elif thing == 'None': 
     newList.append(0) 
    else: 
     matches = re.findall(digitReg,thing) 
     forNum = int(matches[0]) 
     againstNum = int(matches[1]) 
     aggregate = forNum - againstNum 
     newList.append(aggregate) 
print newList 
print len(newList) 

문제는 골재의 값을 던지고있는 태그 자체가 그 안에 자리를 가지고 있다는 것입니다.

보통 코드는 int(matches[2])int(matches[3])으로 변경됩니다. 그러나 이것은 다른 목록에서이 코드를 실행하고 태그 자체의 일치 항목 수가 변경되므로 신뢰할 수 없습니다.

일치하는 항목이 발견되기 전에 목록에서 태그를 제거 할 수 있습니까?

답변

2

은 당신이 할 수있는 아름다운 수프를 사용하여 각 태그의 내부 텍스트를 추출하려면

aggregate = 0 
for thing in list: 
    if thing == '0': 
     newList.append(0) 
    elif thing == 'None': 
     newList.append(0) 
    else: 
     matches = re.findall(digitReg, BeautifulSoup(thing,'html.parser').text) 
     againstNum = int(matches[1]) 
     aggregate = forNum - againstNum 
     newList.append(aggregate) 
+1

완벽! 대단히 감사합니다! 오류를 막기 위해 BeautifulSoup 대괄호에서 'html.parser'를 가져와야했지만 지금은 매력처럼 작동합니다. – modestmotion

관련 문제