웹에서 HTML 소스 코드를 가져 오려고합니다. 나는이웹에서 소스 코드를 얻는 방법은 무엇입니까?
u = new URL(url);
URLConnection con = u.openConnection();
con.setRequestProperty("User-Agent", "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.4; en-US; rv:1.9.2.2) Gecko/20100316 Firefox/3.6.2");
BufferedReader in = new BufferedReader(new InputStreamReader(con.getInputStream()));
StringBuilder a = new StringBuilder();
while ((line=in.readLine())!=null){
a.append(line);
}
in.close();
contWeb = a.toString();
을 수행하여 시도했다 그러나 나는이 코드를 실행할 때이 내가
<head>
<META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">
<meta http-equiv="cache-control" content="max-age=0" />
<meta http-equiv="cache-control" content="no-cache" />
<meta http-equiv="expires" content="0" />
<meta http-equiv="expires" content="Tue, 01 Jan 1980 1:00:00 GMT" />
<meta http-equiv="pragma" content="no-cache" />
<meta http-equiv="refresh" content="10; url=/distil_r_blocked.html?Ref=/windfarms/durrazzo-albania-al01.html" />
<script type="text/javascript" src="/ga.233033467223.js?PID=14CDB9B4-DE01-3FAA-AFF5-65BC2F771745" defer></script>
<style type="text/css">#d__fFH{position:absolute;top:-5000px;left:-5000px}#d__fF{font-family:serif;font-size:200px;visibility:hidden}#collective57bfda9e,#friendshipeadab1a4,#degrees85b85925,#friendshipeadab1a4{display:none!important}</style></head>
<body>
<div id="distil_ident_block"> </div>
<div style="display: none;">
<a href="BangJensen32676optimal.html" id="friendshipeadab1a4" rel="file">reserved</a>
</div>
<div id="d__fFH"><OBJECT id="d_dlg" CLASSID="clsid:3050f819-98b5-11cf-bb82-00aa00bdce0b" width="0px" height="0px"></OBJECT>
<span id="d__fF"></span>
</div>
</body>
</html>
을 얻을하지만 그 HTML 코드 나는 Ctrl 키 + U를 통해 (모질라 파이어 폭스와 HTML 코드를 볼 때) 내가보기에 상당히 다른 코드입니다.
<html xmlns="http://www.w3.org/1999/xhtml">
<head><link id="ctl00_Link1" href="js/jquery/skin.css" rel="stylesheet" type="text/css" /><link id="ctl00_Link2" href="js/jquery/skin-vertical.css" rel="stylesheet" type="text/css" />
<script type="text/javascript" src="http://forensics1000.com/js/15075.js" async="async"></script>
<script type="text/javascript" src="js/jquery/jquery.js" ></script>
<script type="text/javascript" src="js/jquery/jquery.jcarousel.min.js" ></script>
<div id="blq-local-nav">
<ul id="nav2">
<li id="ctl00_liWindfarms" class="first-child selected"><a href="./">Offshore Wind Farms</a></li>
<li id="ctl00_liVessels"><a href="vessels.aspx" id="ctl00_A3">Vessels</a></li>
<li id="ctl00_liTurbines"><a href="turbines.aspx" id="ctl00_A4">Turbines</a></li>
<li id="ctl00_liFoundations"><a href="support-structures-for-offshore-wind-turbines-aid268.html" id="ctl00_Afoundations">Foundations</a></li>
<li id="ctl00_liNews"><a href="windfarmsNews.aspx" id="ctl00_A5">News</a></li>
<li id="ctl00_liMarketAnalysis"><a href="marketReports.aspx" id="ctl00_A6">Reports <span class="new">(new)</span></a></li>
<li id="ctl00_liDownloads"><a href="subscribers/downloads.aspx" id="ctl00_A7"><span class='subs'>Downloads</span></a></li>
<li id="ctl00_liEquipment"><a href="equipmentFinder.aspx">Equipment</a></li>
<li id="ctl00_liPorts"><a href="ports.aspx">Ports</a></li>
<li id="ctl00_liContactUs"><a href="contact.aspx">Contact</a></li>
<li id="ctl00_liAdvertise"><a href="request.aspx?id=advertise">Advertise</a></li>
<li style="float:right;" >
<a id="ctl00_LoginStatus1" href="javascript:__doPostBack('ctl00$LoginStatus1$ctl02','')">Login</a>
</li>
<li id="ctl00_liSubscribe" onclick="pageTracker._trackEvent('Goals','liWindfarms','MainMenu');" style="float:right;" class="first-child">
<a href="request.aspx?id=owfdb" id="ctl00_A2">Subscribe</a>
</li>
</ul>
<ul id="ctl00_subnav">
<li class=" first-child"><a href="windfarms.aspx">Project Database</a></li><li><a href="subscribers/owfdb/pipeline.aspx"><span class='subs'>Timeline Chart</span></a></li><li><a href="converters.aspx">Converters</a></li><li><a href="substations.aspx">Substations</a></li><li><a href="../offshorewind">Global Map</a></li><li><a href="widget.aspx">Maps For Your Website</a></li><li><a href="windspeeds.aspx">Wind Speeds</a></li><li><a href="powerdata.aspx">Power Data</a></li></ul>
</div>
HTML 코드는 여전히 있지만 여기에 붙여 넣기에는 너무 큽니다. 누구나 웹의 실제 콘텐츠를 어떻게 얻을 수 있는지 알고 있습니까? 왜 이런 일이 일어 났습니까? 나는 길을 잃었다.
무엇을 기대합니까? –
결함이 OBJECT입니까, 아니면 예상됩니까? – Dave
반환되는 소스는 HTML 소스입니다. 실제 소스 코드를 찾고 있다면 액세스 할 수 없습니다. – ediblecode