2016-12-22 7 views
0

이 주제에 대한 많은 연구를 수행했으며 사전에서 DataFrame으로 데이터를 이전하는 것에 대한 논의가 있지만 그 중 딜레마를 해결하는 게시물은 없다는 것을 알고 있습니다.Dict에서 Pandas DataFrame으로 데이터 이동

기본적으로 HTML 테이블에서 데이터를 구문 분석하여 사전에 저장했습니다. 이제 DataFrame으로 옮겨서 Oracle 테이블로 모두 내보낼 수 있어야합니다. 나쁜 부분은 내 사전에는 오직 하나의 키가 있고 모든 데이터는 Values ​​.......입니다. 따라서 데이터로 DataFrame을 만들면 1 열 75 행이됩니다.

가능한 경우 특정 데이터를 특정 열에 입력하는 방법을 알아야합니다. Python이나 Pandas는 정수와 문자의 차이점과 이들을 넣을 열을 알고 있습니까? 이 코드는 구문 분석 된 여러 테이블의 테이블에서 작동해야하기 때문입니다.

나는 가장 최근 날짜의 데이터 만 가져 오므로 모든 데이터는 5/20에 해당합니다. 여기

후 내 데이터 사전으로, 해석되는 :

     0 
0     ID 
1  Available Quota 
2 Live Weight Pounds 
3    Price 
4   Date Posted 
5     1724 
6    GOM COD 
7    GOM HADD 
8    GOM BB 
9    GREYSOLE 
10    DABS 
11    GOM YT 
12    2328 
13     445 
14    3007 
15     850 
16    3101 
17    1995 
18    Package 
19    $9,000 
20    5/20 
21    1578 
22    GBE COD 
23    GBW COD 
24    GB BB 
25    GB YT 
26    SNE BB 
27    SNE YT 
28    GOM BB 
29    Whake 
..     ... 
45    $1.00 
46    $0.45 
47    $0.50 
48    $0.15 
49    $0.20 
50    $0.01 
51    $0.01 
52    5/20 
53     310 
54    GBE COD 
55    GBW COD 
56    DABS 
57    WHAKE 
58    POLL 
59     RED 
60    SNE BB 
61    GOM BB 
62     825 
63    9033 
64    1241 
65    3120 
66    65234 
67    76610 
68    1688 
69    1195 
70    2121 
71    7285 
72    Package 
73    $15,000 
74    5/20 

[75 rows x 1 columns] 
<class 'pandas.core.frame.DataFrame'> 

내가 이상적 것 :

{'row_of_data': ['ID', 'Available Quota', 'Live Weight Pounds', 'Price', 'Date Posted', '1724', 'GOM COD', 'GOM HADD', 'GOM BB', 'GREYSOLE', 'DABS', 'GOM YT', '2328', '445', '3007', '850', '3101', '1995', 'Package', '$9,000', '5/20', '1578', 'GBE COD', 'GBW COD', 'GB BB', 'GB YT', 'SNE BB', 'SNE YT', 'GOM BB', 'Whake', 'POLL', 'RED', '538', '5894', '1755', '243', '490', '153', '3965', '2727', '9227', '15060', '$1.00', '$0.40', '$0.20', '$1.00', '$0.45', '$0.50', '$0.15', '$0.20', '$0.01', '$0.01', '5/20', '310', 'GBE COD', 'GBW COD', 'DABS', 'WHAKE', 'POLL', 'RED', 'SNE BB', 'GOM BB', '825', '9033', '1241', '3120', '65234', '76610', '1688', '1195', '2121', '7285', 'Package', '$15,000', '5/20']}

그리고 마지막으로, 여기 내 시도는 DataFrame로 데이터를 퍼팅에있다 깨끗한 DataFrame으로 인쇄하려면 원본 데이터 테이블과 비슷합니다. 여기

내 관련 코드입니다 :

for s in var2: 
    if s == str1: 
     var4 = {'row_of_data' : [] } 
     for idx, val in enumerate(s): 
      var4['row_of_data'].extend(rows[idx].stripped_strings) 

fish = np.array(values) 
print(fish) 

fishdf = pd.DataFrame(fish) 
print(fishdf) 

그리고 여기에 HTML 코드입니다 :

<html> 
<head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /> 
<title>FW: NEFS 2 Available Quota 5/21</title> 
<link rel="important stylesheet" href=""> 
<style>div.headerdisplayname {font-weight:bold;}</style></head> 
<body> 
<table border=0 cellspacing=0 cellpadding=0 width="100%" class="header-part1"><tr><td><b>Subject: </b>FW: NEFS 2 Available Quota 5/21</td></tr><tr><td><b>From: </b>Claire Fitz-Gerald <[email protected]></td></tr><tr><td><b>Date: </b>5/21/2014 10:08 AM</td></tr></table><br> 
<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40"><head><META HTTP-EQUIV="Content-Type" CONTENT="text/html; "><meta name=Generator content="Microsoft Word 12 (filtered medium)"><!--[if !mso]><style>v\:* {behavior:url(#default#VML);} 
o\:* {behavior:url(#default#VML);} 
w\:* {behavior:url(#default#VML);} 
.shape {behavior:url(#default#VML);} 
</style><![endif]--><style><!-- 
/* Font Definitions */ 
@font-face 
    {font-family:"Cambria Math"; 
    panose-1:2 4 5 3 5 4 6 3 2 4;} 
@font-face 
    {font-family:Calibri; 
    panose-1:2 15 5 2 2 2 4 3 2 4;} 
@font-face 
    {font-family:Tahoma; 
    panose-1:2 11 6 4 3 5 4 4 2 4;} 
@font-face 
    {font-family:"Franklin Gothic Book"; 
    panose-1:2 11 5 3 2 1 2 2 2 4;} 
@font-face 
    {font-family:"Franklin Gothic Demi"; 
    panose-1:2 11 7 3 2 1 2 2 2 4;} 
/* Style Definitions */ 
p.MsoNormal, li.MsoNormal, div.MsoNormal 
    {margin:0in; 
    margin-bottom:.0001pt; 
    font-size:11.0pt; 
    font-family:"Calibri","sans-serif";} 
a:link, span.MsoHyperlink 
    {mso-style-priority:99; 
    color:blue; 
    text-decoration:underline;} 
a:visited, span.MsoHyperlinkFollowed 
    {mso-style-priority:99; 
    color:purple; 
    text-decoration:underline;} 
span.EmailStyle17 
    {mso-style-type:personal; 
    font-family:"Calibri","sans-serif"; 
    color:windowtext;} 
span.title1 
    {mso-style-name:title1; 
    font-family:"Arial","sans-serif"; 
    color:#1F487E; 
    font-weight:normal;} 
span.EmailStyle19 
    {mso-style-type:personal-reply; 
    font-family:"Calibri","sans-serif"; 
    color:#1F497D;} 
.MsoChpDefault 
    {mso-style-type:export-only; 
    font-size:10.0pt;} 
@page WordSection1 
    {size:8.5in 11.0in; 
    margin:1.0in 1.0in 1.0in 1.0in;} 
div.WordSection1 
    {page:WordSection1;} 
--></style><!--[if gte mso 9]><xml> 
<o:shapedefaults v:ext="edit" spidmax="1026" /> 
</xml><![endif]--><!--[if gte mso 9]><xml> 
<o:shapelayout v:ext="edit"> 
<o:idmap v:ext="edit" data="1" /> 
</o:shapelayout></xml><![endif]--></head><body lang=EN-US link=blue vlink=purple><div class=WordSection1><p class=MsoNormal><span style='color:#1F497D'>Please see the below quota listings.<o:p></o:p></span></p><p class=MsoNormal><span style='color:#1F497D'><o:p>&nbsp;</o:p></span></p><p class=MsoNormal><span style='color:#1F497D'>Thanks,<o:p></o:p></span></p><p class=MsoNormal><span style='color:#1F497D'><o:p>&nbsp;</o:p></span></p><div><p class=MsoNormal><span style='font-size:12.0pt;font-family:"Franklin Gothic Book","sans-serif";color:#1F497D'>Claire Fitz-Gerald<o:p></o:p></span></p><p class=MsoNormal><i><span style='font-size:10.0pt;font-family:"Franklin Gothic Book","sans-serif";color:#1F497D'><o:p>&nbsp;</o:p></span></i></p><p class=MsoNormal><b><span style='font-family:"Franklin Gothic Demi","sans-serif";color:#002776'>Cape Cod Commercial Fishermen's Alliance<o:p></o:p></span></b></p><p class=MsoNormal><b><span style='font-family:"Franklin Gothic Book","sans-serif";color:#DE3500'>~ Small Boats.&nbsp; Big Ideas. ~</span></b><b><span style='color:#DE3500'><o:p></o:p></span></b></p></div><p class=MsoNormal><span style='color:#1F497D'><o:p>&nbsp;</o:p></span></p><div><div style='border:none;border-top:solid #B5C4DF 1.0pt;padding:3.0pt 0in 0in 0in'><p class=MsoNormal><b><span style='font-size:10.0pt;font-family:"Tahoma","sans-serif"'>From:</span></b><span style='font-size:10.0pt;font-family:"Tahoma","sans-serif"'> David Leveille [mailto:[email protected]] <br><b>Sent:</b> Wednesday, May 21, 2014 8:50 AM<br><b>To:</b> David Leveille<br><b>Subject:</b> NEFS 2 Available Quota 5/21<o:p></o:p></span></p></div></div><p class=MsoNormal><o:p>&nbsp;</o:p></p><p class=MsoNormal><span style='font-size:12.0pt;font-family:"Arial","sans-serif";color:#1F487E'>AVAILABLE QUOTA FY 2014</span><span style='font-size:12.0pt;font-family:"Times New Roman","serif"'><o:p></o:p></span></p><table class=MsoNormalTable border=0 cellspacing=0 cellpadding=0 width="71%" style='width:71.28%'><tr><td width=220 style='width:164.95pt;border:none;border-bottom:solid windowtext 1.0pt;background:#8BCDFF;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='line-height:15.0pt'><b><span style='font-size:9.0pt;font-family:"Arial","sans-serif";color:black'>ID <o:p></o:p></span></b></p></td><td width=161 style='width:120.75pt;border:none;border-bottom:solid windowtext 1.0pt;background:#8BCDFF;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='mso-line-height-alt:15.0pt'><b><span style='font-size:18.0pt;font-family:"Arial","sans-serif";color:black'>Available Quota <o:p></o:p></span></b></p></td><td width=189 style='width:141.75pt;border:none;border-bottom:solid windowtext 1.0pt;background:#8BCDFF;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='mso-line-height-alt:15.0pt'><b><span style='font-size:18.0pt;font-family:"Arial","sans-serif";color:black'>Live Weight Pounds <o:p></o:p></span></b></p></td><td width=126 style='width:94.55pt;border:none;border-bottom:solid windowtext 1.0pt;background:#8BCDFF;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='mso-line-height-alt:15.0pt'><b><span style='font-size:18.0pt;font-family:"Arial","sans-serif";color:black'>Price <o:p></o:p></span></b></p></td><td width=168 style='width:125.95pt;border:none;border-bottom:solid windowtext 1.0pt;background:#8BCDFF;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='mso-line-height-alt:15.0pt'><b><span style='font-size:18.0pt;font-family:"Arial","sans-serif";color:black'>Date Posted <o:p></o:p></span></b></p></td></tr><tr><td width=220 style='width:164.95pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>1724<o:p></o:p></span></p></td><td width=161 style='width:120.75pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>GOM COD<br>GOM HADD<br>GOM BB<br>GREYSOLE<br>DABS<br>GOM YT<o:p></o:p></span></p></td><td width=189 style='width:141.75pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>2328<br>445<br>3007<br>850<br>3101<br>1995<o:p></o:p></span></p></td><td width=126 style='width:94.55pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>Package<o:p></o:p></span></p><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'><o:p>&nbsp;</o:p></span></p><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>$9,000<o:p></o:p></span></p></td><td width=168 style='width:125.95pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>5/20<o:p></o:p></span></p></td></tr><tr><td width=220 style='width:164.95pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>1578<o:p></o:p></span></p></td><td width=161 style='width:120.75pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>GBE COD<br>GBW COD<br>GB BB<br>GB YT<br>SNE BB<br>SNE YT<br>GOM BB<br>Whake<br>POLL<br>RED<o:p></o:p></span></p></td><td width=189 style='width:141.75pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>538<br>5894<br>1755<br>243<br>490<br>153<br>3965<br>2727<br>9227<br>15060<o:p></o:p></span></p></td><td width=126 style='width:94.55pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>$1.00<br>$0.40<br>$0.20<br>$1.00<br>$0.45<br>$0.50<br>$0.15<br>$0.20<br>$0.01<br>$0.01<o:p></o:p></span></p></td><td width=168 style='width:125.95pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>5/20<o:p></o:p></span></p></td></tr><tr><td width=220 style='width:164.95pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>310<o:p></o:p></span></p></td><td width=161 style='width:120.75pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>GBE COD<br>GBW COD<br>DABS<br>WHAKE<br>POLL<br>RED<br>SNE BB<br>GOM BB<o:p></o:p></span></p></td><td width=189 style='width:141.75pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>825<br>9033<br>1241<br>3120<br>65234<br>76610<br>1688<br>1195<br>2121<br>7285<o:p></o:p></span></p></td><td width=126 style='width:94.55pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>Package<o:p></o:p></span></p><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'><o:p>&nbsp;</o:p></span></p><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>$15,000<o:p></o:p></span></p></td><td width=168 style='width:125.95pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>5/20<o:p></o:p></span></p></td></tr><tr style='height:23.25pt'><td width=220 style='width:164.95pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt;height:23.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>347<o:p></o:p></span></p></td><td width=161 style='width:120.75pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt;height:23.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>SNE BB<o:p></o:p></span></p></td><td width=189 style='width:141.75pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt;height:23.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>8,000<o:p></o:p></span></p></td><td width=126 style='width:94.55pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt;height:23.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>$0.50<o:p></o:p></span></p></td><td width=168 style='width:125.95pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt;height:23.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>5/7<o:p></o:p></span></p></td></tr><tr><td width=220 style='width:164.95pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>1878A<o:p></o:p></span></p></td><td width=161 style='width:120.75pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>GOM COD<br>GOM HADD<br>SNE BB<br>GOM BB<br>GB BB<br>GREYSOLE<br>GOM YT<br>SNE YT<br>POLL<o:p></o:p></span></p></td><td width=189 style='width:141.75pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>6188<br>635<br>3916<br>7873<br>6762<br>3358<br>9776<br>271<br>186550<o:p></o:p></span></p></td><td width=126 style='width:94.55pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>$1.95<br>$1.35<br>$0.50<br>$0.50<br>$0.20<br>$1.40<br>$1.20<br>$0.50<br>$0.01<o:p></o:p></span></p></td><td width=168 style='width:125.95pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>5/12<o:p></o:p></span></p></td></tr><tr><td width=220 style='width:164.95pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>1878B<o:p></o:p></span></p></td><td width=161 style='width:120.75pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>GBE COD<br>GBW COD<br>GB YT<o:p></o:p></span></p></td><td width=189 style='width:141.75pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>1113<br>12186<br>850<o:p></o:p></span></p></td><td width=126 style='width:94.55pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>Package<br>$10,000<o:p></o:p></span></p></td><td width=168 style='width:125.95pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>5/12<o:p></o:p></span></p></td></tr></table><p class=MsoNormal><o:p>&nbsp;</o:p></p><p class=MsoNormal><o:p>&nbsp;</o:p></p><p class=MsoNormal>David Leveille<o:p></o:p></p><p class=MsoNormal>II Northeast Fishery Sector Inc.<o:p></o:p></p><p class=MsoNormal>10 Witham Street<o:p></o:p></p><p class=MsoNormal>Gloucester, MA. 01930<o:p></o:p></p><p class=MsoNormal>Cell 978 375 3509<o:p></o:p></p><p class=MsoNormal>Fax 978 281 1555<o:p></o:p></p><p class=MsoNormal>Web <a href="http://nefs2.com/">http://nefs2.com/</a><o:p></o:p></p><p class=MsoNormal><o:p>&nbsp;</o:p></p><div class=MsoNormal align=center style='text-align:center'><span style='font-size:12.0pt;font-family:"Times New Roman","serif"'></body></html> 
</body> 
</html> 

새로운 코드를 사용 :

from bs4 import BeautifulSoup 
import os 
import re 
import numpy as np 
import pandas as pd 
import cx_Oracle 
import lxml 
import html5lib 

path = 'C:\\EVERYTHING FROM Z DRIVE\\blub' 

def readhtml(path): 
    df = pd.pandas.read_html(io =path) 
    print(df) 

if __name__ == "__main__": 
    readhtml('path') 
다음
+2

입니다 generated/pandas.read_html.html # pandas-read-html. 또한 변환기를 사용하여 각 열에 대한 데이터 형식을 지정하는 옵션이 있습니다 – Shijo

+0

흠 나는 팬더에서 테이블을 읽을 수 있는지조차 몰랐습니다. 이 프로젝트의 일은 많은 HTML 테이블을 구문 분석 할 수있는 강력한 코드가 필요하다는 것입니다. 그래서 내 코드는 이제 오른쪽 상단 날짜를 캡처하고 "if"가 다시 발생하고 모든 해당 데이터를 캡처하는 것을 볼 수 있습니다 ....이 pandas.read_html이 그런 일을 할 수 있습니까? – theprowler

+0

예, 먼저 팬더에 html 테이블을 읽은 다음 데이터 프레임에있는 모든 종류의 필터를 수행 할 수 있습니다 – Shijo

답변

0
def readhtml(path): 
    df=pd.pandas.read_html(io =path) 
    print df 


if __name__=="__main__": 
    readhtml('C:\Users\xxx\samples.html') 

이의 샘플입니다 테스트 용으로 사용 된 html

01 이 프로그램에 의해 생성 23,516,

출력은 사용을 읽어 봤어 pandas.read_html http://pandas.pydata.org/pandas-docs/stable/

[  0 1 2 3 
0 ID H1 H2 H3 
1  1 X1 Y1 Y2 
2 NaN X2 Y1 Y2 
3 NaN X3 Y1 Y2 
4 NaN X4 Y1 Y2 
5  2 X5 Y1 Y2 
6 NaN X6 Y1 Y2 
7 NaN X7 Y1 Y2 
8 NaN X8 Y1 Y2 
9 NaN X9 Y1 Y2 
10 NaN X10 Y1 Y2 
11 2 X11 Y1 Y2 
12 NaN NaN NaN NaN] 

<html xmlns:o="urn:schemas-microsoft-com:office:office" 
 
xmlns:x="urn:schemas-microsoft-com:office:excel" 
 
xmlns="http://www.w3.org/TR/REC-html40"> 
 

 
<head> 
 
<meta http-equiv=Content-Type content="text/html; charset=windows-1252"> 
 
<meta name=ProgId content=Excel.Sheet> 
 
<meta name=Generator content="Microsoft Excel 14"> 
 
<link rel=File-List href="samples_files/filelist.xml"> 
 
<style id="sample_14456_Styles"> 
 
<!--table 
 
\t {mso-displayed-decimal-separator:"\."; 
 
\t mso-displayed-thousand-separator:"\,";} 
 
.xl1514456 
 
\t {padding-top:1px; 
 
\t padding-right:1px; 
 
\t padding-left:1px; 
 
\t mso-ignore:padding; 
 
\t color:black; 
 
\t font-size:11.0pt; 
 
\t font-weight:400; 
 
\t font-style:normal; 
 
\t text-decoration:none; 
 
\t font-family:Calibri, sans-serif; 
 
\t mso-font-charset:0; 
 
\t mso-number-format:General; 
 
\t text-align:general; 
 
\t vertical-align:bottom; 
 
\t mso-background-source:auto; 
 
\t mso-pattern:auto; 
 
\t white-space:nowrap;} 
 
.xl6514456 
 
\t {padding-top:1px; 
 
\t padding-right:1px; 
 
\t padding-left:1px; 
 
\t mso-ignore:padding; 
 
\t color:#3E4349; 
 
\t font-size:9.0pt; 
 
\t font-weight:700; 
 
\t font-style:normal; 
 
\t text-decoration:none; 
 
\t font-family:Arial, sans-serif; 
 
\t mso-font-charset:0; 
 
\t mso-number-format:General; 
 
\t text-align:general; 
 
\t vertical-align:bottom; 
 
\t mso-background-source:auto; 
 
\t mso-pattern:auto; 
 
\t white-space:nowrap;} 
 
--> 
 
</style> 
 
</head> 
 

 
<body> 
 
<!--[if !excel]>&nbsp;&nbsp;<![endif]--> 
 
<!--The following information was generated by Microsoft Excel's Publish as Web 
 
Page wizard.--> 
 
<!--If the same item is republished from Excel, all information between the DIV 
 
tags will be replaced.--> 
 
<!-----------------------------> 
 
<!--START OF OUTPUT FROM EXCEL PUBLISH AS WEB PAGE WIZARD --> 
 
<!-----------------------------> 
 

 
<div id="sample_14456" align=center x:publishsource="Excel"> 
 

 
<table border=0 cellpadding=0 cellspacing=0 width=256 style='border-collapse: 
 
collapse;table-layout:fixed;width:192pt'> 
 
<col width=64 span=4 style='width:48pt'> 
 
<tr height=19 style='height:14.4pt'> 
 
    <td height=19 class=xl6514456 width=64 style='height:14.4pt;width:48pt'>ID</td> 
 
    <td class=xl1514456 width=64 style='width:48pt'>H1</td> 
 
    <td class=xl1514456 width=64 style='width:48pt'>H2</td> 
 
    <td class=xl1514456 width=64 style='width:48pt'>H3</td> 
 
</tr> 
 
<tr height=19 style='height:14.4pt'> 
 
    <td height=19 class=xl1514456 align=right style='height:14.4pt'>1</td> 
 
    <td class=xl1514456>X1</td> 
 
    <td class=xl1514456>Y1</td> 
 
    <td class=xl1514456>Y2</td> 
 
</tr> 
 
<tr height=19 style='height:14.4pt'> 
 
    <td height=19 class=xl1514456 style='height:14.4pt'></td> 
 
    <td class=xl1514456>X2</td> 
 
    <td class=xl1514456>Y1</td> 
 
    <td class=xl1514456>Y2</td> 
 
</tr> 
 
<tr height=19 style='height:14.4pt'> 
 
    <td height=19 class=xl1514456 style='height:14.4pt'></td> 
 
    <td class=xl1514456>X3</td> 
 
    <td class=xl1514456>Y1</td> 
 
    <td class=xl1514456>Y2</td> 
 
</tr> 
 
<tr height=19 style='height:14.4pt'> 
 
    <td height=19 class=xl1514456 style='height:14.4pt'></td> 
 
    <td class=xl1514456>X4</td> 
 
    <td class=xl1514456>Y1</td> 
 
    <td class=xl1514456>Y2</td> 
 
</tr> 
 
<tr height=19 style='height:14.4pt'> 
 
    <td height=19 class=xl1514456 align=right style='height:14.4pt'>2</td> 
 
    <td class=xl1514456>X5</td> 
 
    <td class=xl1514456>Y1</td> 
 
    <td class=xl1514456>Y2</td> 
 
</tr> 
 
<tr height=19 style='height:14.4pt'> 
 
    <td height=19 class=xl1514456 style='height:14.4pt'></td> 
 
    <td class=xl1514456>X6</td> 
 
    <td class=xl1514456>Y1</td> 
 
    <td class=xl1514456>Y2</td> 
 
</tr> 
 
<tr height=19 style='height:14.4pt'> 
 
    <td height=19 class=xl1514456 style='height:14.4pt'></td> 
 
    <td class=xl1514456>X7</td> 
 
    <td class=xl1514456>Y1</td> 
 
    <td class=xl1514456>Y2</td> 
 
</tr> 
 
<tr height=19 style='height:14.4pt'> 
 
    <td height=19 class=xl1514456 style='height:14.4pt'></td> 
 
    <td class=xl1514456>X8</td> 
 
    <td class=xl1514456>Y1</td> 
 
    <td class=xl1514456>Y2</td> 
 
</tr> 
 
<tr height=19 style='height:14.4pt'> 
 
    <td height=19 class=xl1514456 style='height:14.4pt'></td> 
 
    <td class=xl1514456>X9</td> 
 
    <td class=xl1514456>Y1</td> 
 
    <td class=xl1514456>Y2</td> 
 
</tr> 
 
<tr height=19 style='height:14.4pt'> 
 
    <td height=19 class=xl1514456 style='height:14.4pt'></td> 
 
    <td class=xl1514456>X10</td> 
 
    <td class=xl1514456>Y1</td> 
 
    <td class=xl1514456>Y2</td> 
 
</tr> 
 
<tr height=19 style='height:14.4pt'> 
 
    <td height=19 class=xl1514456 align=right style='height:14.4pt'>2</td> 
 
    <td class=xl1514456>X11</td> 
 
    <td class=xl1514456>Y1</td> 
 
    <td class=xl1514456>Y2</td> 
 
</tr> 
 
<![if supportMisalignedColumns]> 
 
<tr height=0 style='display:none'> 
 
    <td width=64 style='width:48pt'></td> 
 
    <td width=64 style='width:48pt'></td> 
 
    <td width=64 style='width:48pt'></td> 
 
    <td width=64 style='width:48pt'></td> 
 
</tr> 
 
<![endif]> 
 
</table> 
 

 
</div> 
 

 

 
<!-----------------------------> 
 
<!--END OF OUTPUT FROM EXCEL PUBLISH AS WEB PAGE WIZARD--> 
 
<!-----------------------------> 
 
</body> 
 

 
</html>

+0

나는 그것을 시험해보고'lxml을 찾을 수 없다. 설치해라. ' – theprowler

+0

좋아, lxml 라이브러리를 설치해야한다. – Shijo

+0

Lol 솔직히 1 시간 동안이 lxml 라이브러리를 설치하는 데 어려움을 겪어왔다. 일반적으로 다운로드 한 다음 명령 창에서 간단한 pip 설치를 수행 할 수 있지만 작동하지 않는 경우 – theprowler

관련 문제