2014-09-24 3 views
-2

안녕하세요, HTML 소스 파일에서 태그를 추출 할 수있는 몇 가지 모듈이 perl에 있음을 알고 있습니다. 그러나 나는이 파일을 추출해야합니다 : Name CA. THAKRAR UTSAV SUBHASH.HTML에서 추출한 이름

http://regex101.com/r/dZ8mY1/1

+0

왜 당신이 HTML 파서를 시도하지? –

답변

0

항상 HTML을 구문 분석에 대한 HTML 파서를 사용합니다.

다음은 검색 할 값을 찾기 위해 Mojo::DOM을 사용합니다. 이 모듈에 유용한 8 분짜리 소개 비디오를 보려면 Mojocast Episode 5을 확인하십시오.

use strict; 
use warnings; 

use Mojo::DOM; 

my $dom = Mojo::DOM->new(do {local $/; <DATA>}); 

for my $td($dom->find('td')->each) { 
    next if $td->all_text ne 'Name'; 

    my $next = $td; 
    while ($next = $next->next_sibling) { 
     last if $next->node eq 'tag' and $next->all_text !~ /^[[:punct:]\s]*$/; 
    } 

    print $next->all_text, "\n"; 
} 

__DATA__ 
<!DOCTYPE html> 
<html> 
<head> 
<meta http-equiv="Content-Language" content="en-us"> 
<meta http-equiv="Content-Type" content="text/html; charset=utf-8"> 
<meta name="GENERATOR" content="Microsoft FrontPage 4.0"> 
<meta name="ProgId" content="FrontPage.Editor.Document"> 
<title>Member Card The Institute of Chartered Accountants of India</title> 

<script language="javascript" type="text/javascript"> 
<!-- var win=null; function NewWindow(mypage,myname,w,h,scroll,pos){ if(pos=="random"){LeftPosition=(screen.width)?Math.floor(Math.random()*(screen.width-w)):100;TopPosition=(screen.height)?Math.floor(Math.random()*((screen.height-h)-75)):100;} if(pos=="center"){LeftPosition=(screen.width)?(screen.width-w)/2:100;TopPosition=(screen.height)?(screen.height-h)/2:100;} else if((pos!="center" && pos!="random") || pos==null){LeftPosition=0;TopPosition=20} settings='width='+w+',height='+h+',top='+TopPosition+',left='+LeftPosition+',scrollbars='+scroll+',location=no,directories=no,status=no,menubar=no,toolbar=no,resizable=no'; win=window.open(mypage,myname,settings);} // --> 
</script> 
<script language="JavaScript1.1"> 
<!-- Original: Vivek Gupta --> <!-- Begin function right(e) { if (navigator.appName == 'Netscape' && (e.which == 3 || e.which == 2)) return false; else if (navigator.appName == 'Microsoft Internet Explorer' && (event.button == 2 || event.button == 3)) { alert("Sorry, you do not have permission to right click."); return false; } return true; } document.onmousedown=right; document.onmouseup=right; if (document.layers) window.captureEvents(Event.MOUSEDOWN); if (document.layers) window.captureEvents(Event.MOUSEUP); window.onmousedown=right; window.onmouseup=right; // End --> 
</script> 
</head> 
<body bgcolor="#ECFFFF"> 
<p align="center"><u><i><b><font size="5">Members Details as on 
Date</font></b></i></u></p> 
<hr> 
<div align="right"> 
<table border="0" width="100%"> 
<tr> 
<td width="13%" bgcolor="#CCCCFF"><font size="2"><b>Membership No.</b></font></td> 
<td width="2%" bgcolor="#99CCFF"><font size="2"><b>:</b></font></td> 
<td width="25%" bgcolor="#99CCFF"><font size="2"><b>140337,&nbsp;&nbsp;&nbsp;</b></font> <b><font color="#FF0000" size="3">ACTIVE</font></b></td> 
<td width="8%" bgcolor="#CCCCFF"><font size="2"><b>Sex</b></font></td> 
<td width="1%" bgcolor="#99CCFF"><font size="2"><b>:</b></font></td> 
<td width="18%" bgcolor="#99CCFF"><font size="2"><b>M</b></font></td> 
<td width="13%" bgcolor="#CCCCFF"><font size="2"><b>Date of Birth</b></font></td> 
<td width="1%" bgcolor="#99CCFF"><font size="2"><b>:</b></font></td> 
<td width="38%" bgcolor="#99CCFF"><font size="2"><b>30/12/1986</b></font></td> 
</tr> 
<tr> 
<td width="13%" bgcolor="#CCCCFF"><font size="2"><b>Name</b></font></td> 
<td width="2%" bgcolor="#99CCFF"><font size="2"><b>:</b></font></td> 
<td width="25%" bgcolor="#99CCFF"><font size="2"><b>CA. THAKRARUTSAV SUBHASH</b></font></td> 
<td width="8%" bgcolor="#CCCCFF"><font size="2"><b>Blood Grp</b></font></td> 
<td width="1%" bgcolor="#99CCFF"><font size="2"><b>:</b></font></td> 
<td width="18%" bgcolor="#99CCFF"><font size="2"><b>B (-)</b></font></td> 
<td width="13%" bgcolor="#CCCCFF"><font size="2"><b>Enrolment Dt.</b></font></td> 
<td width="1%" bgcolor="#99CCFF"><font size="2"><b>:</b></font></td> 
<td width="38%" bgcolor="#99CCFF"><font size="2"><b>29/07/2011</b></font></td> 
</tr> 
<tr> 
<td width="13%" bgcolor="#CCCCFF"><font size="2"><b>Asso.&nbsp;/Fellow</b></font></td> 
<td width="2%" bgcolor="#99CCFF">:</td> 
<td width="25%" bgcolor="#99CCFF"><font size="2"><b>ACA</b></font></td> 
<td width="8%" bgcolor="#CCCCFF"><font size="2"><b>Nationality</b></font></td> 
<td width="1%" bgcolor="#99CCFF"></td> 
<td width="18%" bgcolor="#99CCFF"><font size="2"><b>IND</b></font></td> 
<td width="13%" bgcolor="#CCCCFF"><font size="2"><b>FellowDate</b></font></td> 
<td width="1%" bgcolor="#99CCFF"><font size="2"><b>:</b></font></td> 
<td width="38%" bgcolor="#99CCFF"><font size="2"><b>&nbsp;</b></font></td> 
</tr> 
<tr> 
<td width="13%" bgcolor="#CCCCFF"><font size="2"><b>Father's Name</b></font></td> 
<td width="2%" bgcolor="#99CCFF"><font size="2"><b>:</b></font></td> 
<td width="25%" bgcolor="#99CCFF"><font size="2"><b>SUBHASH THAKRAR</b></font></td> 
<td width="8%" bgcolor="#CCCCFF"></td> 
<td width="1%" bgcolor="#99CCFF"></td> 
<td width="18%" bgcolor="#99CCFF"></td> 
<td width="13%" bgcolor="#CCCCFF"><b><font size="2">COP Status</font></b></td> 
<td width="1%" bgcolor="#99CCFF"><b>:</b></td> 
<td width="27%" bgcolor="#99CCFF"><font size="2"><b>FULLTIME</b></font></td> 
</tr> 
</table> 
</div> 
<hr> 
<div align="right"> 
<table border="0" width="100%"> 
<tr> 
<td width="50%" colspan="2" bgcolor="#CCCCFF"><u><font size="2"><b>Professional Address Details</b></font></u></td> 
<td width="50%" colspan="2" bgcolor="#CCCCFF"><u><font size="2"><b>Residential Address Details</b></font></u></td> 
</tr> 
<tr> 
<td width="50%" colspan="2"></td> 
<td width="50%" colspan="2"></td> 
</tr> 
<tr> 
<td width="50%" colspan="2" bgcolor="#99CCFF"><font size="2"><b>OPP PUNJAB NATIONAL BANK</b></font></td> 
<td width="50%" colspan="2" bgcolor="#99CCFF"><font size="2"><b>M/S CHATRABHUJ SAVJI &amp; CO</b></font></td> 
</tr> 
<tr> 
<td width="50%" colspan="2" bgcolor="#99CCFF"><font size="2"><b>SUTARWADA</b></font></td> 
<td width="50%" colspan="2" bgcolor="#99CCFF"><font size="2"><b>SUTARWADA</b></font></td> 
</tr> 
<tr> 
<td width="50%" colspan="2" bgcolor="#99CCFF"><font size="2"><b>&nbsp;</b></font></td> 
<td width="50%" colspan="2" bgcolor="#99CCFF"><font size="2"><b>&nbsp;</b></font></td> 
</tr> 
<tr> 
<td width="50%" colspan="2" bgcolor="#99CCFF"><font size="2"><b>&nbsp;</b></font></td> 
<td width="50%" colspan="2" bgcolor="#99CCFF"><font size="2"><b>&nbsp;</b></font></td> 
</tr> 
<tr> 
<td width="50%" colspan="2" bgcolor="#99CCFF"><font size="2"><b>PORBANDAR - 360575</b></font></td> 
<td width="50%" colspan="2" bgcolor="#99CCFF"><font size="2"><b>PORBANDAR - 360575</b></font></td> 
</tr> 
<tr> 
<td width="50%" colspan="2" bgcolor="#99CCFF"><font size="2"><b>INDIA</b></font></td> 
<td width="50%" colspan="2" bgcolor="#99CCFF"><font size="2"><b>INDIA</b></font></td> 
</tr> 
<tr> 
<td width="14%" bgcolor="#CCCCFF"><font size="2"><b>Tel. No.</b></font></td> 
<td width="36%" bgcolor="#99CCFF"><font size="2"><b>0286-2243863</b></font></td> 
<td width="14%" bgcolor="#CCCCFF"><font size="2"><b>Tel. No.</b></font></td> 
<td width="34%" bgcolor="#99CCFF"><font size="2"><b>0286 2245641</b></font></td> 
</tr> 
<tr> 
<td width="14%" bgcolor="#CCCCFF"><font size="2"><b>Fax. No.</b></font></td> 
<td width="36%" bgcolor="#99CCFF"><font size="2"><b>&nbsp;</b></font></td> 
<td width="14%" bgcolor="#CCCCFF"><font size="2"><b>Fax. No.</b></font></td> 
<td width="34%" bgcolor="#99CCFF"><font size="2"><b>&nbsp;</b></font></td> 
</tr> 
<tr> 
<td width="14%" bgcolor="#CCCCFF"><font size="2"><b>Mob. No.</b></font></td> 
<td width="36%" bgcolor="#99CCFF"><font size="2"><b>09409059418</b></font></td> 
<td width="14%" bgcolor="#CCCCFF"><font size="2"><b>Mob. No.</b></font></td> 
<td width="34%" bgcolor="#99CCFF"><font size="2"><b>09409059418</b></font></td> 
</tr> 
<tr> 
<td width="14%" bgcolor="#CCCCFF"><font size="2"><b>E-mail.</b></font></td> 
<td width="36%" bgcolor="#99CCFF"><font size="2"><b>[email protected]</b></font></td> 
<td width="14%" bgcolor="#CCCCFF"><font size="2"><b>E-mail.</b></font></td> 
<td width="34%" bgcolor="#99CCFF"><font size="2"><b>[email protected]</b></font></td> 
</tr> 
</table> 
</div> 
<hr> 
<div align="right"> 
<table border="0" width="100%"> 
<tr> 
<td width="29%"><a href="locm_res.asp?MRH_MRN=140337" onclick="NewWindow(this.href,'mywin','500','400','no','center');return false" onfocus="this.blur()"><b><font color="#0000FF">Member Employment Details</font></b></a></td> 
<td width="27%"><a href="locm_ocp.asp?MRH_MRN=140337" onclick="NewWindow(this.href,'mywin','900','400','no','center');return false" onfocus="this.blur()"><b><font color="#0000FF">Member Firm Association Details</font></b></a></td> 
<td width="44%"><b><a href="locm_article.asp?MRH_MRN=140337" onclick="NewWindow(this.href,'mywin','900','400','yes','center');return false" onfocus="this.blur()"><font color="#0000FF">Article/Audit (List of Student undergoing Training with details)</font></a></b></td> 
</tr> 
<tr> 
<td width="29%">.</td> 
<td width="27%"></td> 
<td width="44%">.</td> 
</tr> 
<tr> 
<td width="100%" colspan="3" align="center"><b><a href="firm_approval.asp" onclick="NewWindow(this.href,'mywin','900','400','yes','center');return false" onfocus="this.blur()"><font color="#0000FF">Search Firm Registered/Approved with ICAI as on Date</font></a></b></td> 
</tr> 
</table> 
</div> 
<p>&nbsp;</p> 
<hr> 
<p>&nbsp;</p> 
</body> 
</html> 

출력 :

CA. THAKRARUTSAV SUBHASH 
관련 문제