기사 태그와 일치하는 정규식을 만들 수 없습니다.

기사 태그와 일치시키고 모든 텍스트를 가져올 정규식을 만들려고했습니다. 여기 기사 태그와 일치하는 정규식을 만들 수 없습니다.

내가 다음 Regex-

<article (.*?)</article> 

(?:<article>)(.*?)(?:</article>)

그들 중

없음 작동하지 호야의 도움을 시도까지 기사 tag.So 안에 모든 것을 필요로 한 태그

<article id="post-82" class="post-82 post type-post status-publish format-standard hentry category-publishing"> 
     <div class="entry-content clearfix">   
         <div class="abh_box abh_box_up abh_box_drop-down"><ul class="abh_tabs"> <li class="abh_about abh_active"> 
<p>With India playing host,</p> 
    <footer class="entry-meta-bar clearfix"><div class="entry-meta clearfix"> 
       <span class="comments"><a href="http://www.test.com/blog/emerging-markets/#respond">No Comments</a></span>   

     </div></footer> 
    </article>

나의 기사입니다 .

출처

2016-11-03 Glory Jain

왜냐하면 정규 표현식은 HTML 구문 분석 작업에 잘못된 도구이기 때문입니다. 완벽하게 작동하지 않을 것입니다. HTML 파서 (HtmlAgilityPack 작동)를 사용하고 승리하십시오. – spender

HTML 구문 분석에 정규식을 사용하지 마십시오. 당신이 정규식을 시도 할 수 있습니다

HtmlDocument doc = new HtmlDocument(); 
doc.LoadHtml(htmlContent); 

var result = doc.DocumentNode.SelectNodes("article").FirstOrDefault();

출처

2016-11-03 10:57:37 mybirthname

왜 downvote? – mybirthname

html로 민첩성 팩 등을 사용하여 HTML 파서는 :

<[article][^>]*>((.|\n)*?)<\/article>

https://regex101.com/r/oOJ9bt/2

출처

2016-11-03 11:18:06

작동하지 않습니다. –

여기에서 작동하는 것을 볼 수 있습니다 : https://regex101.com/r/oOJ9bt/2 –

이 같은 뭔가를 정규식을 사용하지 않으려는 당신이 필요하지 않습니다 XML 파서를로드한다. 포함 된 HTML에 대해 원하는 요소에 .getAttribute("innerHTML")을 사용하기 만하면됩니다.

예를 들어 ID로 제공된 HTML에서 article 요소 만 가져옵니다.

System.out.println(driver.findElement(By.id("post-82")).getAttribute("innerHTML"));

이것은 페이지의 모든 기사에 대한 HTML을 가져옵니다.

for (WebElement article : driver.findElements(By.tagName("article"))) 
{ 
    System.out.println(article.getAttribute("innerHTML")); 
}

출처

2016-11-03 13:26:16 JeffC

기사 태그와 일치하는 정규식을 만들 수 없습니다.

답변

관련 문제