2017-12-04 8 views
1

나는 다음과 같은 코드를 가지고 내가 P 태그를 추출하는 Beautifulsoup에서 태그의 하위를 추출하는 방법은 무엇입니까?

<p><strong>1. Start big</strong><br><br> 
Make a slam dunk right away. Boom! Just do it! Start strong! If you’re making a list article about poodle outerwear, don’t save the best for last: put that sporty little pool-vest idea right up there at the top. </p> 

<p><strong>2. Hook them and hook them good</strong><br><br> 
A recent study of lists (included in another article about the top ten research studies, natch), assembled by some guy you’ve never heard of from an obscure European university in his spare time, found that Web readers usually don’t make it past the first few items on a list. Sad, isn’t it? I bet you’re already thinking about stopping. Yes, it sucks to know people have shorter attention spans than an overly-caffeinated Himalayan fruit-fly. Make the first few count, okay?</p> 

<p><strong>3. Stay on message</strong><br><br> 
Let’s say you’re writing a list article about the top movies starring Naomi Watts that don’t suck. It’s a short list, if you remember anything about King Kong or her early indie films. I see this kind of thing pop up on <a href="http://www.foxnews.com" rel="nofollow">Fox News</a> and <a href="http://www.metacritic.com" rel="nofollow">Metacritic</a> once in awhile, and I usually can’t stop myself from clicking on them. You get into sort of a click-trance. In fact, hang on a second. I think there might be one on the top opening acts when The Bieb performs in space. Oh yes there is! Okay, back. So, in your article list of the top movies that use a Meatloaf song in the soundtrack, adding that one from Black Sabbath is just not proper usage. We want Meatloaf and Meatloaf only, people! Besides, Black Sabbath is for sissies.</p> 

파이썬에서 내 코드

내부의 내용을 추출하기 위해 내가 강한 추출하는 데에 무엇을 추가해야

soup = BeautifulSoup(page, "lxml") 

for content in soup.find_all('p'): 
    print(content) 

되고 싶어?

<strong>1. Start big</strong> 
<strong>2. Hook them and hook them good</strong> 
<strong>3. Stay on message</strong> 

당신은하지 .find

당신은 .select에 BS4의 문서를 찾을 수 있습니다, CSS 선택기의 .select 방법을 사용해야합니다

는 이미 soup.find_all('p > strong')

답변

1
from bs4 import BeautifulSoup 

page = """ 
<p><strong>1. Start big</strong><br><br> 
Make a slam dunk right away. Boom! Just do it! Start strong! If you’re making a list article about poodle outerwear, don’t save the best for last: put that sporty little pool-vest idea right up there at the top. </p> 

<p><strong>2. Hook them and hook them good</strong><br><br> 
A recent study of lists (included in another article about the top ten research studies, natch), assembled by some guy you’ve never heard of from an obscure European university in his spare time, found that Web readers usually don’t make it past the first few items on a list. Sad, isn’t it? I bet you’re already thinking about stopping. Yes, it sucks to know people have shorter attention spans than an overly-caffeinated Himalayan fruit-fly. Make the first few count, okay?</p> 

<p><strong>3. Stay on message</strong><br><br> 
Let’s say you’re writing a list article about the top movies starring Naomi Watts that don’t suck. It’s a short list, if you remember anything about King Kong or her early indie films. I see this kind of thing pop up on <a href="http://www.foxnews.com" rel="nofollow">Fox News</a> and <a href="http://www.metacritic.com" rel="nofollow">Metacritic</a> once in awhile, and I usually can’t stop myself from clicking on them. You get into sort of a click-trance. In fact, hang on a second. I think there might be one on the top opening acts when The Bieb performs in space. Oh yes there is! Okay, back. So, in your article list of the top movies that use a Meatloaf song in the soundtrack, adding that one from Black Sabbath is just not proper usage. We want Meatloaf and Meatloaf only, people! Besides, Black Sabbath is for sissies.</p> 
""" 

soup = BeautifulSoup(page, 'lxml') 

for content in soup.select('p > strong'): 
    print(content) 

가 출력 시도 here 및 w3schools의 일부 CSS 선택기 문서 here.

+2

또는 그냥 다시 찾으십시오.'print (content.find ('strong'))' – erocoar

+0

그래, 특히'content'를 사용하여 다른 일을 동시에하고 싶다면 – cssko

관련 문제