2012-11-28 7 views
1

방금 ​​BeautifulSoup을 시작했습니다. 매우 유사한 페이지로 이동하여 섹션 아래에있는 단락을 모두 반환하는 스크립트를 만들려고합니다. 현재 두 번째는 예측할 수있을 것이다 동안, 나는 첫 번째 단락 위대한 작품 다음 코드BeautifulSoup에서 불확실한 수의 단락을 반환하십시오.

def BrightstormPageTest(): 
    soup = Soup(urllib.urlopen('http://brightstorm.com/science/chemistry/chemical-reaction-rates/collision-theory/').read()) 
    relevantTagText = "" 
    for element in soup.findAll("section"): 
      print element.nextSibling 

을 가지고,하지만 난에서 형제 자매를 원하는 첫 번째 섹션은 항상 정확히 하나 개의 단락이 두 가지 섹션이 있습니다 아마 1에서 10 사이의 숫자. 이 작업을 수행하는 방법에 대한 아이디어가 있습니까?

관련 HTML은 :

<section> 
     <div class="page-header"> 
     <h2> 
     Explanation 
     </h2> 
     </div> 
    </section> 
    <p> 
     <strong> 
     Collision theory 
     </strong> 
     is a model for explaining chemical reactions and reaction rates using the interactions of particles within the reactants. There are three important parts to 
     <strong> 
     collision theory 
     </strong> 
     , that reacting substances must collide, that they must collide with enough energy and that they must collide with the correct orientation. Increasing the kinetic energy of these particles or decreasing their volume increases the frequency of collisions and speeds a reaction. 
    </p> 
    <section> 
     <div class="page-header"> 
     <h2> 
     Transcript 
     </h2> 
     </div> 
    </section> 
    <p> 
     Alright so we're going to talk about the collision theory. And the collision theory comes into play when you're talking about reactions and actually what happens in a reaction and how a reaction actually goes from the reactant all the way to the product. So the first thing we're going to have to discuss is, the fact that the reacting substances whatever we're dealing with the atoms, ions or molecules must collide in order for the reaction to occur. Okay that seems pretty obvious so we have our 2 reactants a and b and they must collide, and this is what we're going to call activated complex or a transition states that's going from, transitioning from the reactants towards the product and it's going to recreate this independent, very high energy activated complex and then yield our products, our 2ab. So the first postulate is that they must come together, okay that's easy enough. 
    </p> 
    <p> 
     The second one says the reactant substances must collide with sufficient energy in order to form that activated complex. Because this activated complex is extremely high, very high in energy, very unstable so they must collide with a certain amount of energy to get to this point. If they don't collide with a good amount of energy then they're actually not going to react at all. So that energy is going to be called our activation energy to get to our activated complex. And you might see the symbol e with a subscript a to note that. And the last thing in the collision theory is that reacting substances must collide with the correct orientation so if they, made a collision at a range that wasn't great for them, they would actually rebound off of each other and not react at all. 
    </p> 
    <p> 
     But if they if they did they have to make sure they line up correctly and then for the correct reaction to occur then they get their activated complex to form the products. And so these 3 things are the basis of that collision theory and how reactants go from reactants to the products. 
    </p> 

바로이 문단 내부의 일을 좀하고 싶습니다.

답변

0

섹션을 반복하고 단락을 반복해야합니다. 데모 용으로 각 단락의 텍스트를 인쇄하도록 코드를 수정했습니다.

from bs4 import BeautifulSoup as Soup 

def BrightstormPageTest(): 
    soup = Soup(urllib.urlopen('http://brightstorm.com/science/chemistry/chemical-reaction-rates/collision-theory/').read()) 
    sections = soup.findAll("section") 
    for section in sections: 
     ps = section.findAll("p") 
     for p in ps: 
      print p.text 

def BrightstormPageTest2(): 
    soup = Soup(urllib.urlopen('http://brightstorm.com/science/chemistry/chemical-reaction-rates/collision-theory/').read()) 
    sections = soup.findAll("section") 
    for section in sections: 
     while True: 
      try: 
       print section.nextSibling.text 
      except TypeError: 
       # .text is a valid method on a <p> element, but not a NavigableString. 
       break 
+0

단락은 실제로 섹션에 없습니다. 섹션은 단락 앞에오고 그 단 하나의 다른 식별자는 문단이 적은 관련 단락이 많이 포함 된 방대한 div입니다. –

+0

'

단락 텍스트

' '요소'섹션은 실제로 여는 태그 자체가 아니라 여는 태그와 닫는 태그 사이의 모든 항목입니다. 따라서 단락은 실제로 섹션 내부에 있습니다. – kreativitea

+1

노노, 그게

그래서 형제를 사용해야했습니다. –

관련 문제