lxml을 사용하여 HTML에서 텍스트를 구문 분석하는 방법은 무엇입니까?

<p> 
    Glassware veteran 
    <strong>Corning </strong> 
    (
    <span class="ticker"> 
     NYSE: 
     <a class="qsAdd qs-source-isssitthv0000001" href="http://caps.fool.com/Ticker/GLW.aspx?source=isssitthv0000001" data-id="203758">GLW</a> 
    </span> 
    <a class="addToWatchListIcon qsAdd qs-source-iwlsitbut0000010" href="http://my.fool.com/watchlist/add?ticker=&source=iwlsitbut0000010" title="Add to My Watchlist"> </a> 
    ) has fallen on hard times lately. Is it time to give up on the stock, or will Corning have a banana and a comeback? 
</p>

"유리 잔 베테랑"을 얻고 싶습니다. "최근에 힘든시기를 보내고 싶습니다. 주식을 포기할 시간이 되었습니까, 아니면 코닝이 바나나와 컴백을할까요?" 코드lxml을 사용하여 HTML에서 텍스트를 구문 분석하는 방법은 무엇입니까?

tnode = root.xpath("/p") 
content = tnode.text

를 사용

난 단지 "유리 베테랑"을 얻을 수있는 이유는 무엇입니까? 이 같은

출처

2012-12-06 yinyao

뭔가 당신이 원하는 걸 얻을 수 있습니다

>>> tnode = root.xpath('/p') 
>>> content = tnode.xpath('text()') 
>>> print ''.join(content) 

Glassware veteran 

(


) has fallen on hard times lately. Is it time to give up on the stock, or will Corning have a banana and a comeback? 
>>>

당신은 텍스트 노드의 모든을 원하는 경우에, 단지 text() 대신 //text()를 사용

>>> print ' '.join([x.strip() for x in ele.xpath('//text()')]) Glassware veteran Corning (NYSE: GLW ) has fallen on hard times lately. Is it time to give up on the stock, or will Corning have a banana and a comeback?

출처

2012-12-06 15:13:18 larsks

가 대단히 감사합니다. 하지만 이제 새로운 문제가 생겼습니다. "Glassware 베테랑 코닝 (NYSE : GLW)이 최근 어려움에 빠졌습니다. 주식을 포기할 시간이 되었습니까, 아니면 코닝이 바나나와 컴백을할까요?" 코드 사용 : tnode = root.xpath ('/ p |/p/strong |/p/a |/p/span') content = tnode.xpath ('text()') print ''.join (content) 최근 코닝은 바나나와 컴백을 할 것인가? 코닝 NYSE : GLW "아이디어가 있습니까? 감사. – yinyao

답변을 업데이트했습니다. – larsks

lxml을 사용하여 HTML에서 텍스트를 구문 분석하는 방법은 무엇입니까?

답변

관련 문제