텍스트를 가져 오는 중입니까?

HTML 태그간에 모든 텍스트를 가져 오는 가장 효율적인 방법은 무엇입니까?텍스트를 가져 오는 중입니까?

<div> 
<a> hi </a> 
....

html 태그로 묶인 텍스트 묶음.

출처

2009-10-03 KJW

체크 아웃 https://github.com/rgrove/ 위생도 – Abram

doc = Nokogiri::HTML(your_html) 
doc.xpath("//text()").to_s

출처

2009-10-03 05:38:39 khelll

고마워! 잘 작동 +1 – rusllonrails

Sax 파서를 사용하십시오. XPath 옵션보다 훨씬 빠릅니다.

require "nokogiri" 

some_html = <<-HTML 
<html> 
    <head> 
    <title>Title!</title> 
    </head> 
    <body> 
    This is the body! 
    </body> 
</html> 
HTML 

class TextHandler < Nokogiri::XML::SAX::Document 
    def initialize 
    @chunks = [] 
    end 

    attr_reader :chunks 

    def cdata_block(string) 
    characters(string) 
    end 

    def characters(string) 
    @chunks << string.strip if string.strip != "" 
    end 
end 
th = TextHandler.new 
parser = Nokogiri::HTML::SAX::Parser.new(th) 
parser.parse(some_html) 
puts th.chunks.inspect

출처

2009-10-10 17:34:10

어떻게만이 본문 태그 사이에 텍스트를 얻을 수 변경 될 수 있습니까? – Omnipresent

플래그를 설정하면 body 태그가 시작된 것을보고 캡쳐를 시작하고 body 태그가 닫힌 후에 캡처를 중지합니다. –

다음은이 페이지의 질문 사업부의 모든 텍스트를 얻는 방법은 다음과 같습니다

require 'rubygems' 
require 'nokogiri' 
require 'open-uri' 

doc = Nokogiri::HTML(open("http://stackoverflow.com/questions/1512850/grabbing-text-between-all-tags-in-nokogiri")) 
puts doc.css("#question").to_s

출처

2009-10-14 04:44:29 pjb3

그냥 수행

doc = Nokogiri::HTML(your_html) 
doc.xpath("//text()").text

출처

2013-01-06 21:02:10 arturodz

텍스트를 가져 오는 중입니까?

답변

관련 문제