2015-01-09 1 views
2

나는 수업 프로젝트를위한 간단한 웹 스크래퍼를 만들려고 노력하고 있으며 공개적으로 사용할 수있는 정보 (사용자가 좋아하는 페이지)를 무작위로 제공하는 페이스 북 페이지를 긁어 내려고 노력하고 있습니다. HtmlUnit을 사용하고 있으며 관련 정보가있는 페이지로 돌아갈 수 있습니다. 그러나 중첩 된 div에서 문자열을 추출하는 데 문제가 있습니다. 여기에 내가 추출하는 데 필요한 정보이다 : 나는 모든 사용자의 취향 '의 목록을 얻을 필요가HtmlUnit - 중첩 된 div 태그 사이에서 값을 추출하려면 어떻게해야합니까?

enter image description here

.

WebClient webClient = new WebClient(BrowserVersion.CHROME); 
webClient.getOptions().setRedirectEnabled(true); 
webClient.getOptions().setCssEnabled(false); 
webClient.getOptions().setThrowExceptionOnScriptError(false); 
webClient.getOptions().setThrowExceptionOnFailingStatusCode(false); 
webClient.getOptions().setUseInsecureSSL(true); 
webClient.getOptions().setJavaScriptEnabled(true); 
webClient.getCookieManager().setCookiesEnabled(true); 
try { 
    HtmlPage page = webClient.getPage(URL); //URL is random Facebook page 
    webClient.waitForBackgroundJavaScript(10000); 

    List<DomElement> divs = page.getElementsByTagName("div"); 
    for(DomElement element : divs){ 
     if(element.getAttribute("class").equals("mediaPageName")){ 
      System.out.println(element.getNodeValue()); 
     } 
    } 

이 너무 깊은 중첩 된 문자열을 추출도 가능 : 작동하지 않는 것까지

내 코드는? 어떤 도움이라도 대단히 감사하겠습니다.

편집 : 요청에 따라 (신체의) 완전한 HTML :

<body class="timelineLayoutLoggedOutUserProfile timelineLayoutLoggedOut _4lh timelineLayout fbx UIPage_LoggedOut _2gsg gecko win x1 Locale_en_GB" dir="ltr"><div class="_li"><div id="pagelet_bluebar" role="banner"><div id="blueBarDOMInspector" class="_21mm"><div id="blueBarNAXAnchor" class="_4f7n _xxp"><div><div class="loggedout_menubar_container"><div class="clearfix loggedout_menubar"><div class="lfloat _ohe"><h1><a href="/" title="Go to Facebook Home"><i class="fb_logo img sp_9vUokIDmpP8 sx_15c231"><u>Facebook logo</u></i></a></h1></div><div class="menu_login_container rfloat _ohf"><form id="login_form" action="https://www.facebook.com/login.php?login_attempt=1" method="post" onsubmit="return window.Event &amp;&amp; Event.__inlineSubmit &amp;&amp; Event.__inlineSubmit(this,event)"><input name="lsd" value="AVpZmDWW" autocomplete="off" type="hidden"><table role="presentation" cellspacing="0"><tbody><tr><td class="html7magic"><label for="email">Email or Phone</label></td><td class="html7magic"><label for="pass">Password</label></td></tr><tr><td><input class="inputtext" name="email" id="email" value="" tabindex="1" type="text"></td><td><input class="inputtext" name="pass" id="pass" tabindex="2" type="password"></td><td><label class="uiButton uiButtonConfirm" id="loginbutton" for="u_0_0"><input value="Log in" tabindex="4" id="u_0_0" type="submit"></label></td></tr><tr><td class="login_form_label_field"><div><div class="uiInputLabel clearfix uiInputLabelLegacy"><input id="persist_box" name="persistent" value="1" tabindex="3" class="uiInputLabelInput uiInputLabelCheckbox" type="checkbox"><label for="persist_box" class="uiInputLabelLabel">Keep me logged in</label></div><input name="default_persistent" value="0" type="hidden"></div></td><td class="login_form_label_field"><a href="https://www.facebook.com/recover/initiate">Forgotten your password?</a></td></tr></tbody></table><input autocomplete="off" name="timezone" value="0" id="u_0_1" type="hidden"><input name="lgnrnd" value="065157_G5Yp" type="hidden"><input id="lgnjs" name="lgnjs" value="1420815122" type="hidden"><input autocomplete="off" id="locale" name="locale" value="en_GB" type="hidden"><input autocomplete="off" name="next" value="https://www.facebook.com/steven.mcguckin.14" type="hidden"><input value="W1tbMyw0LDEzLDQxLDQyLDYxLDcwLDczLDgxLDEyNywxODEsMTkwLDIwMCwyMTUsMjE2LDIxOSwyMjYsMjQwLDI1NiwyODAsMjg5LDI5NiwyOTcsMzE1LDMxOCwzMjEsMzQ0LDM2Nyw0MDYsNDQzLDQ0Nyw0ODYsNTAwLDU1MCw1NzgsNTg0LDU4OSw1OTEsNjAxLDYwNSw2NTAsNjU1XV0sIkFabkJCVm5sUnpyRlZZeTVLSEZ1d1FGZW1ZeEowV3p4XzFNSExQV2ZPbzNkb0NWdDh5MnhZQkNLN0dBYnA4dmRhN3YydDZaN29aVzQ4NmhzczlsN2E3ZmZUSXl4dmI3eUhtNWRRQU04amZfdy1BSVBQYUpBNVAwb2ViYzJ5aUNnTDhLRGwteE0xUWFHUllNNnV0alk3aHkwc21BOVRpcFR4NV9kVUhwc3l0enZGekZ3ZWhJZUI3ZXpMaXZKWEFNWG1HVzk4THVhdjRjZjRNcnl1V0lrOS1uX1gweUZmVWFsb3NuYTlBU01td0FwZ2ciXQ==" name="qsstamp" type="hidden"></form></div></div></div></div></div></div></div><div id="globalContainer" class="uiContextualLayerParent"><div class="fb_content clearfix " id="content" role="main"><div><div id="toolbarContainer" class="hidden_elem"></div><div id="mainContainer"><div id="leftCol"></div><div id="contentCol" class="clearfix hasRightCol"><div id="rightCol" role="complementary"><div id="rightColContent"></div></div><div id="contentArea" role="main"><div class="_5h60" id="pagelet_timeline_main_column" data-referrer="pagelet_timeline_main_column" data-gt="{&quot;profile_owner&quot;:&quot;100003774926665&quot;,&quot;ref&quot;:&quot;timeline:timeline&quot;}"><div class="timelineLoggedOutSignUpWithoutCover"><div class="_5h60" id="pagelet_loggedout_sign_up" data-referrer="pagelet_loggedout_sign_up"><div class="pam uiBoxOverlay bottomborder"><div class="fsxl fwb">Steven Mcguckin<br> is on Facebook.</div><div class="mvm fsl">To connect with Steven, sign up for Facebook today.</div><a class="uiButton uiButtonSpecial uiButtonLarge" href="/r.php?profile_id=100003774926665&amp;next=https%3A%2F%2Fwww.facebook.com%2Fsteven.mcguckin.14&amp;friend_or_subscriber=friend" role="button"><span class="uiButtonText">Sign Up</span></a><a class="uiButton uiButtonConfirm uiButtonLarge" href="/login.php?next=https%3A%2F%2Fwww.facebook.com%2Fsteven.mcguckin.14" role="button" name="login"><span class="uiButtonText">Log in</span></a></div></div></div><div class="fbTimelineTopSectionBase _6-d _529n _6_5"><div class="_5h60" id="pagelet_above_header_timeline" data-referrer="pagelet_above_header_timeline"></div><div id="above_header_timeline_placeholder"></div><div class="fbTimelineSection mtm fbTimelineTopSection fbTimelineLoggedOutTopSection"><div id="fbProfileCover"><div class="cover" id="u_0_2"><div class="coverEmptyWrap _37fg coverImage coverNoImage" id="fbCoverImageContainer" data-cropped="1"><img class="coverChangeThrobber img" src="https://fbstatic-a.akamaihd.net/rsrc.php/v2/yk/r/LOOn0JtHNzb.gif" alt="" height="16" width="16"></div></div><div id="fbTimelineHeadline" class="clearfix"><div class="actions"><div class="_5h60 actionsDropdown" id="pagelet_timeline_profile_actions" data-referrer="pagelet_timeline_profile_actions"></div></div><div class="name"><div class="photoContainer"><div><a class="profilePicThumb" href="https://www.facebook.com/photo.php?fbid=114855338650296&amp;set=a.114854815317015.15000.100003774926665&amp;type=1&amp;source=11" rel="theater"><img class="profilePic img" alt="Steven Mcguckin" src="https://fbcdn-profile-a.akamaihd.net/hprofile-ak-prn2/v/t1.0-1/c46.46.577.577/s160x160/540497_114855338650296_1335437651_n.jpg?oh=8d3b224ef56c50a8a043ce49901e5d5f&amp;oe=556CFE3E&amp;__gda__=1429368453_dff0e7bb5ca2eca91432ccdb97bc5d92"></a></div><meta itemprop="image" content="https://fbcdn-profile-a.akamaihd.net/hprofile-ak-prn2/v/t1.0-1/c46.46.577.577/s50x50/540497_114855338650296_1335437651_n.jpg?oh=054f9dc65e6f67982ad9aafe2229cce4&amp;oe=55241D58&amp;__gda__=1429778354_e0db2aeed8623991fe8eb84f7253f0b5"></div><h2 itemprop="name">Steven Mcguckin</h2></div></div></div></div></div><div class="timelineLoggedOutPagelet"><div class="clearfix"><div class="timelineLoggedOutMain lfloat _ohe"><div class="_5h60 allFavorites" id="pagelet_all_favorites" data-referrer="pagelet_all_favorites"><div class="fbTimelineSection mtm timelineFavorites fbTimelineCompactSection"><div class="profileInfoSection" id="favorites"><div class="uiHeader fbTimelineAboutMeHeader"><div class="clearfix uiHeaderTop"><div><h4 class="uiHeaderTitle">Favourites</h4></div></div></div><div class="phs"><table role="presentation" class="mtm _5e7- profileInfoTable _3stp _3stn"><tbody><tr><th class="label"><div class="labelContainer">Music</div></th><td class="data"><div class="mediaRowWrapper"><ul class="uiList pbl mediaRow _509- _4ki _6-h _704 _6-i"><li><div class="mediaPortrait"><div class="profilePicContainer"><div class="blackBackground"></div><img class="photo img" src="https://fbcdn-profile-a.akamaihd.net/hprofile-ak-xpf1/v/t1.0-1/p80x80/407738_10150488772887362_814101701_n.jpg?oh=25075b281b5dcecf3ab6e63c07b984c5&amp;oe=556D1F73&amp;__gda__=1428710219_9c8849d34cea33b428bce751dd612694" title="Queen" height="75" width="75"><div class="likeButtonContainer"></div></div><a class="mediaRowItem" href="https://www.facebook.com/Queen"><div class="mediaPageName">Queen</div></a></div></li><li><div class="mediaPortrait"><div class="profilePicContainer"><div class="blackBackground"></div><img class="photo img" src="https://fbcdn-profile-a.akamaihd.net/hprofile-ak-xpa1/v/t1.0-1/p80x80/1395394_10151745021285264_2126877109_n.jpg?oh=4688157fb7cdcba5ae00faa4d6d77911&amp;oe=55240FFB&amp;__gda__=1430011181_16a4e7ade29650e8615d1b95b659a928" title="Metallica" height="75" width="75"><div class="likeButtonContainer"></div></div><a class="mediaRowItem" href="https://www.facebook.com/Metallica"><div class="mediaPageName">Metallica</div></a></div></li><li><div class="mediaPortrait"><div class="profilePicContainer"><div class="blackBackground"></div><img class="photo img" src="https://fbcdn-profile-a.akamaihd.net/hprofile-ak-frc3/v/t1.0-1/p80x80/537033_10151701254952981_2043344916_n.jpg?oh=2731c9db543d1fba29decf94729585a2&amp;oe=552B2377&amp;__gda__=1429429046_381861d8e3d2283b0da76c45e0339f37" title="Creedence Clearwater Revival" height="75" width="75"><div class="likeButtonContainer"></div></div><a class="mediaRowItem" href="https://www.facebook.com/CCR"><div class="mediaPageName">Creedence Clearwater Revival</div></a></div></li><li><div class="mediaPortrait"><div class="profilePicContainer"><div class="blackBackground"></div><img class="photo img" src="https://fbcdn-profile-a.akamaihd.net/hprofile-ak-xaf1/v/t1.0-1/p80x80/10406831_10152503824276088_447997052969832982_n.jpg?oh=f15bc83a65fc50aba5f4d24b6626fc65&amp;oe=553CC6D5&amp;__gda__=1429526616_03d5eda2091b5254eae8addf2319787d" title="Jimi Hendrix" height="75" width="75"><div class="likeButtonContainer"></div></div><a class="mediaRowItem" href="https://www.facebook.com/JimiHendrix"><div class="mediaPageName">Jimi Hendrix</div></a></div></li></ul></div></td></tr><tr class="spacer"><td colspan="2"><hr></td></tr></tbody><tbody><tr><th class="label"><div class="labelContainer">Games</div></th><td class="data"><div class="mediaRowWrapper"><ul class="uiList pbl mediaRow _509- _4ki _6-h _704 _6-i"><li><div class="mediaPortrait"><div class="profilePicContainer"><div class="blackBackground"></div><img class="photo img" src="https://fbcdn-profile-a.akamaihd.net/hprofile-ak-xaf1/v/t1.0-1/p80x80/10350338_10155107236685556_4391846249065841385_n.jpg?oh=179afe33d10d68af31cb3d3df577cb9d&amp;oe=55322992&amp;__gda__=1430299694_d3261137fb42061b4a6eeb1833e9a30e" title="League of Legends" height="75" width="75"><div class="likeButtonContainer"></div></div><a class="mediaRowItem" href="https://www.facebook.com/leagueoflegends"><div class="mediaPageName">League of Legends</div></a></div></li><li><div class="mediaPortrait"><div class="profilePicContainer"><div class="blackBackground"></div><img class="photo img" src="https://fbcdn-profile-a.akamaihd.net/hprofile-ak-xpa1/v/t1.0-1/p80x80/10857718_692946477488149_3478982491235992102_n.jpg?oh=a01453157d14b980487f9604cc2e1ef0&amp;oe=55400821&amp;__gda__=1429135421_e0844416b4710a0be857071a0e779c6e" title="League of Legends - Brasil" height="75" width="75"><div class="likeButtonContainer"></div></div><a class="mediaRowItem" href="https://www.facebook.com/LeagueofLegendsBrasil"><div class="mediaPageName">League of Legends - Brasil</div></a></div></li></ul></div></td></tr><tr class="spacer"><td colspan="2"><hr></td></tr></tbody><tbody><tr><th class="label"><div class="labelContainer">Other</div></th><td class="data"><div class="uiCollapsedList uiCollapsedListHidden uiCollapsedListNoSeparate pagesListData" id="u_0_5"><span class="visible"><a href="https://www.facebook.com/BestOfVines">Best Vines</a>, <a href="https://www.facebook.com/epicvinesofficial">Epic Vines</a></span></div></td></tr><tr class="spacer"><td colspan="2"><hr></td></tr></tbody></table></div></div></div></div></div><div class="timelineLoggedOutRight rfloat _ohf"><div class="fbTimelineSection mtm fbTimelineCompactSection"><div class="_5h60" id="pagelet_search" data-referrer="pagelet_search"><div><div class="uiHeader fbTimelineAboutMeHeader"><div class="clearfix uiHeaderTop"><div><h4 class="uiHeaderTitle">Wrong <a href="/public/Steven-Mcguckin">Steven Mcguckin</a>? Try Again</h4></div></div></div><div class="phs"><form class="mvl mhm pts" method="get" action="/search.php" onsubmit="return window.Event &amp;&amp; Event.__inlineSubmit &amp;&amp; Event.__inlineSubmit(this,event)" id="u_0_8"><div class="uiComboInput"><input class="inputtext" value="Steven Mcguckin" name="q" type="text"><label class="comboButton uiButton" for="u_0_7"><input value="Search" id="u_0_7" type="submit"></label></div></form></div></div></div></div><div class="_5h60" id="pagelet_people_same_name" data-referrer="pagelet_people_same_name"><div><div class="fbTimelineSection mtm fbTimelineCompactSection"><div class="uiHeader fbTimelineAboutMeHeader"><div class="clearfix uiHeaderTop"><div><h4 class="uiHeaderTitle">Others named Steven Mcguckin</h4></div></div></div><ul class="uiList phs pts profile-friends _4kg"><li><div class="clearfix"><a class="_8o _8t lfloat _ohe" href="https://www.facebook.com/steven.mcguckin.1238" tabindex="-1" aria-hidden="true"><img class="img" src="https://fbcdn-profile-a.akamaihd.net/hprofile-ak-xfa1/v/t1.0-1/c189.47.583.583/s50x50/523313_109653462547302_1582598403_n.jpg?oh=bd1e7485d44e684999409f897443a767&amp;oe=552EF4EE&amp;__gda__=1433244975_de9b54ed6b21e48002dc07b56c34cfe4" alt="Steven Mcguckin"></a><div class="_42ef"><div class="_6a"><div class="_6a _6b" style="height:50px"></div><div class="_6a _6b"><div class="profileFriendsContent"><div class="profileFriendsText"><strong><a href="https://www.facebook.com/steven.mcguckin.1238">Steven Mcguckin</a></strong></div></div></div></div></div></div></li><li><div class="clearfix"><a class="_8o _8t lfloat _ohe" href="https://www.facebook.com/steven.mcguckin.102" tabindex="-1" aria-hidden="true"><img class="img" src="https://fbcdn-profile-a.akamaihd.net/hprofile-ak-xpa1/v/t1.0-1/p50x50/10888641_1558468697729470_1574411548062292882_n.jpg?oh=ac4263182f6baa14ff9c7f8f9a2d4912&amp;oe=552DE084&amp;__gda__=1428276923_68104f6666539fff8416bb758a245484" alt="Steven Mcguckin"></a><div class="_42ef"><div class="_6a"><div class="_6a _6b" style="height:50px"></div><div class="_6a _6b"><div class="profileFriendsContent"><div class="profileFriendsText"><strong><a href="https://www.facebook.com/steven.mcguckin.102">Steven Mcguckin</a></strong></div></div></div></div></div></div></li></ul></div><div class="fbTimelineSection mtm fbTimelineCompactSection"><div class="uiHeader fbTimelineAboutMeHeader fbTimelineInternalHeader"><div class="clearfix uiHeaderTop"><div><h4 class="uiHeaderTitle">Others with a similar name</h4></div></div></div><ul class="uiList phs pts profile-friends _4kg"><li><div class="clearfix"><a class="_8o _8t lfloat _ohe" href="https://www.facebook.com/steven.woltz" tabindex="-1" aria-hidden="true"><img class="img" src="https://fbcdn-profile-a.akamaihd.net/hprofile-ak-frc3/v/t1.0-1/p50x50/1378740_728203910527490_87439746_n.jpg?oh=78c3fee8fb2a5fe8232a7e3861920541&amp;oe=5537F185&amp;__gda__=1428474249_20af1867912771f1e4fb29342863b18c" alt="Steven Woltz"></a><div class="_42ef"><div class="_6a"><div class="_6a _6b" style="height:50px"></div><div class="_6a _6b"><div class="profileFriendsContent"><div class="profileFriendsText"><strong><a href="https://www.facebook.com/steven.woltz">Steven Woltz</a></strong></div></div></div></div></div></div></li><li><div class="clearfix"><a class="_8o _8t lfloat _ohe" href="https://www.facebook.com/steven.post.12" tabindex="-1" aria-hidden="true"><img class="img" src="https://fbcdn-profile-a.akamaihd.net/hprofile-ak-xap1/v/t1.0-1/p50x50/10922431_10204796417863129_9212432233500209728_n.jpg?oh=b2e4752277fe32ae57cefc07ed11a4fd&amp;oe=552812A2&amp;__gda__=1430023335_d7784ee124e6c04636aa22daf9702168" alt="Steven Post"></a><div class="_42ef"><div class="_6a"><div class="_6a _6b" style="height:50px"></div><div class="_6a _6b"><div class="profileFriendsContent"><div class="profileFriendsText"><strong><a href="https://www.facebook.com/steven.post.12">Steven Post</a></strong></div></div></div></div></div></div></li><li><div class="clearfix"><a class="_8o _8t lfloat _ohe" href="https://nl-nl.facebook.com/steven.driesen.98" tabindex="-1" aria-hidden="true"><img class="img" src="https://fbcdn-profile-a.akamaihd.net/hprofile-ak-ash2/v/t1.0-1/p50x50/1151013_10202074356919805_986387285_n.jpg?oh=bf601dd1ccc03cfe4cfd27cd2d56b70c&amp;oe=55374CEE&amp;__gda__=1430285150_d3f8b42a1723db70c1e301b62b46e45a" alt="Steven Driesen"></a><div class="_42ef"><div class="_6a"><div class="_6a _6b" style="height:50px"></div><div class="_6a _6b"><div class="profileFriendsContent"><div class="profileFriendsText"><strong><a href="https://nl-nl.facebook.com/steven.driesen.98">Steven Driesen</a></strong></div></div></div></div></div></div></li><li><div class="clearfix"><a class="_8o _8t lfloat _ohe" href="https://en-gb.facebook.com/steven.sutherland.35" tabindex="-1" aria-hidden="true"><img class="img" src="https://fbcdn-profile-a.akamaihd.net/hprofile-ak-xfa1/v/t1.0-1/c107.31.391.391/s50x50/20674_100488083317683_2223071_n.jpg?oh=19a4fa1623a204ffc1f92958dc89a440&amp;oe=5524D113&amp;__gda__=1429645973_595004c5b58702edc2c70de8701c3851" alt="Steven Sutherland"></a><div class="_42ef"><div class="_6a"><div class="_6a _6b" style="height:50px"></div><div class="_6a _6b"><div class="profileFriendsContent"><div class="profileFriendsText"><strong><a href="https://en-gb.facebook.com/steven.sutherland.35">Steven Sutherland</a></strong></div></div></div></div></div></div></li><li><div class="clearfix"><a class="_8o _8t lfloat _ohe" href="https://www.facebook.com/stevekapaun" tabindex="-1" aria-hidden="true"><img class="img" src="https://fbcdn-profile-a.akamaihd.net/hprofile-ak-xfa1/v/t1.0-1/c0.8.50.50/p50x50/1618522_10152293141074122_756396209_n.jpg?oh=9590e197d0633cecd811b7b71ed0b7e7&amp;oe=5536BC7E&amp;__gda__=1428456951_b07ee3c50c1f7edd2b8800e27e619c02" alt="Steven Kapaun"></a><div class="_42ef"><div class="_6a"><div class="_6a _6b" style="height:50px"></div><div class="_6a _6b"><div class="profileFriendsContent"><div class="profileFriendsText"><strong><a href="https://www.facebook.com/stevekapaun">Steven Kapaun</a></strong></div></div></div></div></div></div></li><li><div class="clearfix"><a class="_8o _8t lfloat _ohe" href="https://www.facebook.com/jhon.steven.549" tabindex="-1" aria-hidden="true"><img class="img" src="https://fbcdn-profile-a.akamaihd.net/hprofile-ak-xaf1/v/t1.0-1/c8.0.50.50/p50x50/25318_101094259928349_814007_n.jpg?oh=d71ec1eac09fff19f0ce81de6d123dbf&amp;oe=55362389&amp;__gda__=1429253978_2865a36a514b16deef4211b77eb67641" alt="Jhon Steven"></a><div class="_42ef"><div class="_6a"><div class="_6a _6b" style="height:50px"></div><div class="_6a _6b"><div class="profileFriendsContent"><div class="profileFriendsText"><strong><a href="https://www.facebook.com/jhon.steven.549">Jhon Steven</a></strong></div></div></div></div></div></div></li><li><div class="clearfix"><a class="_8o _8t lfloat _ohe" href="https://en-gb.facebook.com/steven.frati.1" tabindex="-1" aria-hidden="true"><img class="img" src="https://fbcdn-profile-a.akamaihd.net/hprofile-ak-xfa1/v/t1.0-1/c50.50.621.621/s50x50/430524_4736301724833_48989225_n.jpg?oh=5ff629341422d0a235680431f0317a35&amp;oe=552DE9B6&amp;__gda__=1428809958_b2d18695108e92b95a00c4ff38c5b356" alt="Steven Frati"></a><div class="_42ef"><div class="_6a"><div class="_6a _6b" style="height:50px"></div><div class="_6a _6b"><div class="profileFriendsContent"><div class="profileFriendsText"><strong><a href="https://en-gb.facebook.com/steven.frati.1">Steven Frati</a></strong></div></div></div></div></div></div></li><li><div class="clearfix"><a class="_8o _8t lfloat _ohe" href="https://www.facebook.com/steven.smyth.18" tabindex="-1" aria-hidden="true"><img class="img" src="https://fbcdn-profile-a.akamaihd.net/hprofile-ak-xpf1/v/t1.0-1/p50x50/1526390_10201844128410526_103308447_n.jpg?oh=3ea17bdb075699741ad4c24f07115ad2&amp;oe=552089EF&amp;__gda__=1428354601_d14a13e379dd14f8a9c267a15336de60" alt="Steven Smyth"></a><div class="_42ef"><div class="_6a"><div class="_6a _6b" style="height:50px"></div><div class="_6a _6b"><div class="profileFriendsContent"><div class="profileFriendsText"><strong><a href="https://www.facebook.com/steven.smyth.18">Steven Smyth</a></strong></div></div></div></div></div></div></li></ul></div></div></div><div class="_5h60" id="pagelet_contact" data-referrer="pagelet_contact"><div class="_4qm1"><div class="clearfix _h71"><span class="_h72 lfloat _ohe _50f8 _50f7">Contact Information</span></div><ul class="uiList fbProfileEditExperiences _4kg _4ks"><li class="_2pi4"><span class="_3-9b _50f8 _50f4">No contact info to show</span></li></ul></div></div></div></div></div></div></div><div id="bottomContent"></div></div></div></div></div><div id="pageFooter" data-referrer="page_footer"><div id="contentCurve"></div><div role="contentinfo" aria-label="Facebook site links"><table class="uiGrid _51mz navigationGrid" cellpadding="0" cellspacing="0"><tbody><tr class="_51mx"><td class="_51m- hLeft plm"><a href="/r.php" title="Sign up for Facebook">Sign Up</a></td><td class="_51m- hLeft plm"><a href="/login/" title="Log in to Facebook">Log in</a></td><td class="_51m- hLeft plm"><a href="/mobile/?ref=pf" title="Check out Facebook Mobile.">Mobile</a></td><td class="_51m- hLeft plm"><a href="/find-friends?ref=pf" title="Find anyone on the web.">Find Friends</a></td><td class="_51m- hLeft plm"><a href="/badges/?ref=pf" title="Embed a Facebook badge on your website.">Badges</a></td><td class="_51m- hLeft plm"><a href="/directory/people/" title="Browse our people directory.">People</a></td><td class="_51m- hLeft plm"><a href="/directory/pages/" title="Browse our Pages directory.">Pages</a></td><td class="_51m- hLeft plm"><a href="/places/" title="Check out popular places on Facebook.">Places</a></td><td class="_51m- hLeft plm _51mw"><a href="/games/" title="Check out Facebook games.">Games</a></td></tr><tr class="_51mx"><td class="_51m- hLeft plm"><a href="/directory/places/" title="Browse our places directory.">Locations</a></td><td class="_51m- hLeft plm"><a href="/facebook" accesskey="8" title="Read our blog, discover the resource centre and find job opportunities.">About</a></td><td class="_51m- hLeft plm"><a href="/campaign/landing.php?placement=pflo&amp;campaign_id=402047449186&amp;extra_1=auto" title="Advertise on Facebook">Create Advert</a></td><td class="_51m- hLeft plm"><a href="/pages/create/?ref_type=sitefooter" title="Create a Page">Create Page</a></td><td class="_51m- hLeft plm"><a href="https://developers.facebook.com/?ref=pf" title="Develop on our platform.">Developers</a></td><td class="_51m- hLeft plm"><a href="/careers/?ref=pf" title="Make your next career move to our brilliant company.">Careers</a></td><td class="_51m- hLeft plm"><a href="/privacy/explanation" title="Learn about your privacy and Facebook.">Privacy</a></td><td class="_51m- hLeft plm"><a href="/help/cookies/?ref=sitefooter" title="Learn about cookies and Facebook.">Cookies</a></td><td class="_51m- hLeft plm _51mw"><a href="/policies/?ref=pf" accesskey="9" title="Review our terms and policies.">Terms</a></td></tr><tr class="_51mx"><td class="_51m- hLeft plm"><a href="/help/?ref=pf" accesskey="0" title="Visit our Help Centre.">Help</a></td></tr></tbody></table></div><div class="mvl copyright"><div><span> Facebook © 2015</span><div class="fsm fwn fcg"><a rel="dialog" ajaxify="/settings/language/language/?uri=https%3A%2F%2Fwww.facebook.com%2Fsteven.mcguckin.14&amp;source=TOP_LOCALES_DIALOG" title="Use Facebook in another language." href="#" role="button">English (UK)</a></div></div></div></div></div></div><script type="text/javascript">/*<![CDATA[*/(function(){function si_cj(m){setTimeout(function(){new Image().src="https:\/\/error.facebook.com\/common\/scribe_endpoint.php?c=si_clickjacking&t=4737"+"&m="+m;},5000);}if(top!=self && !false){try{if(parent!=top){throw 1;}var si_cj_d=["apps.facebook.com","apps.beta.facebook.com"];var href=top.location.href.toLowerCase();for(var i=0;i<si_cj_d.length;i++){if (href.indexOf(si_cj_d[i])>=0){throw 1;}}si_cj("3 https:\/\/www.facebook.com\/public\/Steven-Mcguckin");}catch(e){si_cj("1 \thttps:\/\/www.facebook.com\/public\/Steven-Mcguckin");window.document.write("\u003Cstyle>body * {display:none !important;}\u003C\/style>\u003Ca href=\"#\" onclick=\"top.location.href=window.location.href\" style=\"display:block !important;padding:10px\">\u003Ci class=\"img sp_qIp5uuAkFU5 sx_7692ea\" style=\"display:block !important\">\u003C\/i>Go to Facebook.com\u003C\/a>");/*lnHZrpcW*/}}}())/*]]>*/</script> 
</body> 
+0

가능합니다. 이미지 대신 질문에 html 코드를 붙여 넣을 수 있습니까? 전체 html을 붙이십시오. – Arya

+0

html로 추가했습니다. – Highway62

+0

브라우저에서 'div'에 대한 xpath를 표시하고 HtmlUnit에서이 항목에 액세스하도록했습니다. 그 이음새는 속성 값을 검색하는 것보다 간단합니다. – MrSmith42

답변

0

작동합니다 js를 다루는 것이 훨씬 낫다. 다음 코드는 Selenium을 사용하여 완벽하게 찾고 있던 정보를 추출합니다.

WebDriver driver = new FirefoxDriver(); 

driver.get(URL); //The Facebook URL I'm trying to scrape 

long end = System.currentTimeMillis() + 5000; 

while(System.currentTimeMillis() < end){ 
    WebElement div = driver.findElement(By.className("mediaPageName")); 

    if(div.isDisplayed()){ 
     break; 
    } 
} 

List<WebElement> allDivs = driver.findElements(By.xpath("//div[@class='mediaPageName']")); 
for(WebElement likes : allDivs){ 
    System.out.println(likes.getText()); 
} 

driver.quit(); 
0

이보십시오, 내가 나타나는, 모두 HtmlUnit과 죽겠다 대신 셀레늄을 사용하여이 문제를 해결

  HtmlElement element = page.getFirstByXPath("//div[@id='favorites']/div[2]/table/tbody[2]/tr/td/div/ul/li[2]/div/a/div"); 
      String text = element.getTextContent(); 
+0

나는 그것을 시도했지만 element = page.getFirstByXPath ("// div [@ id = '즐겨 찾기']/div [2]/table/tbody [2]/tr/td/div/ul/li [2]/div/a/div "); 은 'null'을 반환합니다. – Highway62

+0

@ Highway62 붙여 넣은 HTML 코드에서 xpath를 얻었습니다. 전체 웹 페이지가 브라우저에로드되는 동안 xpath를 가져 와서 해당 xpath로 시도하십시오. 작동하지 않으면 htmlunit에서 html 코드를 인쇄 한 다음 htmlunit 렌더링 된 html 코드에서 xpath를 가져옵니다. – Arya

+0

요소에서 얻은 정확한 xpath를 사용하여 시도했습니다. HtmlElement 요소 = page.getFirstByXPath ("/ html/body/div/div [2]/div [1]/div/div [2]/div [2 div/div/div [2]/table/tbody [2]/tr [1]/td/div/ul/li [2]/div/a/div "); 하지만 null도 반환했습니다. – Highway62

관련 문제