2017-01-19 5 views
3

다음 데이터 프레임을 Oracle 테이블로 내보내려면이 데이터 프레임을 조 변경하고 싶습니다. 다음과 같아야합니다Python - Pandas DataFrame 전 환

enter image description here

그리고 두 번째 ID 년대의 시작 : 나는 쉽게 내 오라클 데이터베이스에 배치 할 수 있도록

0 ID         Available Quota \ 
1 1724  GOM COD GOM HADD GOM BB GREYSOLE DABS GOM YT 
2 1578 GBE COD GBW COD GB BB GB YT SNE BB SNE YT GOM ... 
3 310 GBE COD GBW COD DABS WHAKE POLL RED SNE BB GOM BB 

0         Live Weight Pounds \ 
1      2328 445 3007 850 3101 1995 
2  538 5894 1755 243 490 153 3965 2727 9227 15060 
3 825 9033 1241 3120 65234 76610 1688 1195 2121 ... 

0            Price Date Posted 
1          Package $9,000  5/20 
2 $1.00 $0.40 $0.20 $1.00 $0.45 $0.50 $0.15 $0.2...  5/20 
3         Package $15,000  5/20 

이상적으로, 데이터는 다음과 같이 정렬해야

enter image description here

원본 데이터 테이블 내 목표는 가장 최근 날짜의 다 구문 분석 아니라, 다음과 같습니다 BTW 타이 :

내 DataFrame이 분명히 있기 때문에 pd.transpose 아무것도 변경하지 않은 사용

enter image description here

(3, 5)가 있어야합니다 (5, 5) 작동하기 위해서이다. 그리고 pd.melt()을 사용하면

     0            value 
0     ID            1724 
1     ID            1578 
2     ID            310 
3  Available Quota  GOM COD GOM HADD GOM BB GREYSOLE DABS GOM YT 
4  Available Quota GBE COD GBW COD GB BB GB YT SNE BB SNE YT GOM ... 
5  Available Quota GBE COD GBW COD DABS WHAKE POLL RED SNE BB GOM BB 
6 Live Weight Pounds      2328 445 3007 850 3101 1995 
7 Live Weight Pounds  538 5894 1755 243 490 153 3965 2727 9227 15060 
8 Live Weight Pounds 825 9033 1241 3120 65234 76610 1688 1195 2121 ... 
9    Price          Package $9,000 
10    Price $1.00 $0.40 $0.20 $1.00 $0.45 $0.50 $0.15 $0.2... 
11    Price         Package $15,000 
12   Date Posted            5/20 
13   Date Posted            5/20 
14   Date Posted            5/20 

....이 또한 수출을 위해 작동하지 않습니다.

내 관련 코드 :

with open(file_path, 'r') as f: 
      def read_html_latest(filename, **kwargs): 
      #with open(filename) as f: 
       text = f.read().replace('<br>', ' ') 
       df = pd.read_html(text, **kwargs)[0] 
       column_headers = ['ID', 'Available Quota', 'Live Weight Pounds', 'Price', 'Date Posted'] 
       df.columns = df.loc[0] 
       df = df.loc[1:] 
       return df.assign(d=pd.to_datetime(df['Date Posted'], format='%m/%d')) \ 
         .query('d == d.max()') \ 
         .drop('d', 1) 
      df = read_html_latest(filename, attrs={'class': 'MsoNormalTable'}) 
      print(df) 

이 문제를 해결하는 어떤 도움도 대단히 감사하겠습니다, 감사 많은.

소스 HTML 코드 :

<html> 
<head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /> 
<title>FW: NEFS 2 Available Quota 5/21</title> 
<link rel="important stylesheet" href=""> 
<style>div.headerdisplayname {font-weight:bold;}</style></head> 
<body> 
<table border=0 cellspacing=0 cellpadding=0 width="100%" class="header-part1"><tr><td><b>Subject: </b>FW: NEFS 2 Available Quota 5/21</td></tr><tr><td><b>From: </b>Claire Fitz-Gerald <[email protected]></td></tr><tr><td><b>Date: </b>5/21/2014 10:08 AM</td></tr></table><br> 
<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40"><head><META HTTP-EQUIV="Content-Type" CONTENT="text/html; "><meta name=Generator content="Microsoft Word 12 (filtered medium)"><!--[if !mso]><style>v\:* {behavior:url(#default#VML);} 
o\:* {behavior:url(#default#VML);} 
w\:* {behavior:url(#default#VML);} 
.shape {behavior:url(#default#VML);} 
</style><![endif]--><style><!-- 
/* Font Definitions */ 
@font-face 
    {font-family:"Cambria Math"; 
    panose-1:2 4 5 3 5 4 6 3 2 4;} 
@font-face 
    {font-family:Calibri; 
    panose-1:2 15 5 2 2 2 4 3 2 4;} 
@font-face 
    {font-family:Tahoma; 
    panose-1:2 11 6 4 3 5 4 4 2 4;} 
@font-face 
    {font-family:"Franklin Gothic Book"; 
    panose-1:2 11 5 3 2 1 2 2 2 4;} 
@font-face 
    {font-family:"Franklin Gothic Demi"; 
    panose-1:2 11 7 3 2 1 2 2 2 4;} 
/* Style Definitions */ 
p.MsoNormal, li.MsoNormal, div.MsoNormal 
    {margin:0in; 
    margin-bottom:.0001pt; 
    font-size:11.0pt; 
    font-family:"Calibri","sans-serif";} 
a:link, span.MsoHyperlink 
    {mso-style-priority:99; 
    color:blue; 
    text-decoration:underline;} 
a:visited, span.MsoHyperlinkFollowed 
    {mso-style-priority:99; 
    color:purple; 
    text-decoration:underline;} 
span.EmailStyle17 
    {mso-style-type:personal; 
    font-family:"Calibri","sans-serif"; 
    color:windowtext;} 
span.title1 
    {mso-style-name:title1; 
    font-family:"Arial","sans-serif"; 
    color:#1F487E; 
    font-weight:normal;} 
span.EmailStyle19 
    {mso-style-type:personal-reply; 
    font-family:"Calibri","sans-serif"; 
    color:#1F497D;} 
.MsoChpDefault 
    {mso-style-type:export-only; 
    font-size:10.0pt;} 
@page WordSection1 
    {size:8.5in 11.0in; 
    margin:1.0in 1.0in 1.0in 1.0in;} 
div.WordSection1 
    {page:WordSection1;} 
--></style><!--[if gte mso 9]><xml> 
<o:shapedefaults v:ext="edit" spidmax="1026" /> 
</xml><![endif]--><!--[if gte mso 9]><xml> 
<o:shapelayout v:ext="edit"> 
<o:idmap v:ext="edit" data="1" /> 
</o:shapelayout></xml><![endif]--></head><body lang=EN-US link=blue vlink=purple><div class=WordSection1><p class=MsoNormal><span style='color:#1F497D'>Please see the below quota listings.<o:p></o:p></span></p><p class=MsoNormal><span style='color:#1F497D'><o:p>&nbsp;</o:p></span></p><p class=MsoNormal><span style='color:#1F497D'>Thanks,<o:p></o:p></span></p><p class=MsoNormal><span style='color:#1F497D'><o:p>&nbsp;</o:p></span></p><div><p class=MsoNormal><span style='font-size:12.0pt;font-family:"Franklin Gothic Book","sans-serif";color:#1F497D'>Claire Fitz-Gerald<o:p></o:p></span></p><p class=MsoNormal><i><span style='font-size:10.0pt;font-family:"Franklin Gothic Book","sans-serif";color:#1F497D'><o:p>&nbsp;</o:p></span></i></p><p class=MsoNormal><b><span style='font-family:"Franklin Gothic Demi","sans-serif";color:#002776'>Cape Cod Commercial Fishermen's Alliance<o:p></o:p></span></b></p><p class=MsoNormal><b><span style='font-family:"Franklin Gothic Book","sans-serif";color:#DE3500'>~ Small Boats.&nbsp; Big Ideas. ~</span></b><b><span style='color:#DE3500'><o:p></o:p></span></b></p></div><p class=MsoNormal><span style='color:#1F497D'><o:p>&nbsp;</o:p></span></p><div><div style='border:none;border-top:solid #B5C4DF 1.0pt;padding:3.0pt 0in 0in 0in'><p class=MsoNormal><b><span style='font-size:10.0pt;font-family:"Tahoma","sans-serif"'>From:</span></b><span style='font-size:10.0pt;font-family:"Tahoma","sans-serif"'> David Leveille [mailto:[email protected]] <br><b>Sent:</b> Wednesday, May 21, 2014 8:50 AM<br><b>To:</b> David Leveille<br><b>Subject:</b> NEFS 2 Available Quota 5/21<o:p></o:p></span></p></div></div><p class=MsoNormal><o:p>&nbsp;</o:p></p><p class=MsoNormal><span style='font-size:12.0pt;font-family:"Arial","sans-serif";color:#1F487E'>AVAILABLE QUOTA FY 2014</span><span style='font-size:12.0pt;font-family:"Times New Roman","serif"'><o:p></o:p></span></p><table class=MsoNormalTable border=0 cellspacing=0 cellpadding=0 width="71%" style='width:71.28%'><tr><td width=220 style='width:164.95pt;border:none;border-bottom:solid windowtext 1.0pt;background:#8BCDFF;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='line-height:15.0pt'><b><span style='font-size:9.0pt;font-family:"Arial","sans-serif";color:black'>ID <o:p></o:p></span></b></p></td><td width=161 style='width:120.75pt;border:none;border-bottom:solid windowtext 1.0pt;background:#8BCDFF;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='mso-line-height-alt:15.0pt'><b><span style='font-size:18.0pt;font-family:"Arial","sans-serif";color:black'>Available Quota <o:p></o:p></span></b></p></td><td width=189 style='width:141.75pt;border:none;border-bottom:solid windowtext 1.0pt;background:#8BCDFF;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='mso-line-height-alt:15.0pt'><b><span style='font-size:18.0pt;font-family:"Arial","sans-serif";color:black'>Live Weight Pounds <o:p></o:p></span></b></p></td><td width=126 style='width:94.55pt;border:none;border-bottom:solid windowtext 1.0pt;background:#8BCDFF;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='mso-line-height-alt:15.0pt'><b><span style='font-size:18.0pt;font-family:"Arial","sans-serif";color:black'>Price <o:p></o:p></span></b></p></td><td width=168 style='width:125.95pt;border:none;border-bottom:solid windowtext 1.0pt;background:#8BCDFF;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='mso-line-height-alt:15.0pt'><b><span style='font-size:18.0pt;font-family:"Arial","sans-serif";color:black'>Date Posted <o:p></o:p></span></b></p></td></tr><tr><td width=220 style='width:164.95pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>1724<o:p></o:p></span></p></td><td width=161 style='width:120.75pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>GOM COD<br>GOM HADD<br>GOM BB<br>GREYSOLE<br>DABS<br>GOM YT<o:p></o:p></span></p></td><td width=189 style='width:141.75pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>2328<br>445<br>3007<br>850<br>3101<br>1995<o:p></o:p></span></p></td><td width=126 style='width:94.55pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>Package<o:p></o:p></span></p><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'><o:p>&nbsp;</o:p></span></p><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>$9,000<o:p></o:p></span></p></td><td width=168 style='width:125.95pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>5/20<o:p></o:p></span></p></td></tr><tr><td width=220 style='width:164.95pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>1578<o:p></o:p></span></p></td><td width=161 style='width:120.75pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>GBE COD<br>GBW COD<br>GB BB<br>GB YT<br>SNE BB<br>SNE YT<br>GOM BB<br>Whake<br>POLL<br>RED<o:p></o:p></span></p></td><td width=189 style='width:141.75pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>538<br>5894<br>1755<br>243<br>490<br>153<br>3965<br>2727<br>9227<br>15060<o:p></o:p></span></p></td><td width=126 style='width:94.55pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>$1.00<br>$0.40<br>$0.20<br>$1.00<br>$0.45<br>$0.50<br>$0.15<br>$0.20<br>$0.01<br>$0.01<o:p></o:p></span></p></td><td width=168 style='width:125.95pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>5/20<o:p></o:p></span></p></td></tr><tr><td width=220 style='width:164.95pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>310<o:p></o:p></span></p></td><td width=161 style='width:120.75pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>GBE COD<br>GBW COD<br>DABS<br>WHAKE<br>POLL<br>RED<br>SNE BB<br>GOM BB<o:p></o:p></span></p></td><td width=189 style='width:141.75pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>825<br>9033<br>1241<br>3120<br>65234<br>76610<br>1688<br>1195<br>2121<br>7285<o:p></o:p></span></p></td><td width=126 style='width:94.55pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>Package<o:p></o:p></span></p><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'><o:p>&nbsp;</o:p></span></p><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>$15,000<o:p></o:p></span></p></td><td width=168 style='width:125.95pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>5/20<o:p></o:p></span></p></td></tr><tr style='height:23.25pt'><td width=220 style='width:164.95pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt;height:23.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>347<o:p></o:p></span></p></td><td width=161 style='width:120.75pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt;height:23.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>SNE BB<o:p></o:p></span></p></td><td width=189 style='width:141.75pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt;height:23.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>8,000<o:p></o:p></span></p></td><td width=126 style='width:94.55pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt;height:23.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>$0.50<o:p></o:p></span></p></td><td width=168 style='width:125.95pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt;height:23.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>5/7<o:p></o:p></span></p></td></tr><tr><td width=220 style='width:164.95pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>1878A<o:p></o:p></span></p></td><td width=161 style='width:120.75pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>GOM COD<br>GOM HADD<br>SNE BB<br>GOM BB<br>GB BB<br>GREYSOLE<br>GOM YT<br>SNE YT<br>POLL<o:p></o:p></span></p></td><td width=189 style='width:141.75pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>6188<br>635<br>3916<br>7873<br>6762<br>3358<br>9776<br>271<br>186550<o:p></o:p></span></p></td><td width=126 style='width:94.55pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>$1.95<br>$1.35<br>$0.50<br>$0.50<br>$0.20<br>$1.40<br>$1.20<br>$0.50<br>$0.01<o:p></o:p></span></p></td><td width=168 style='width:125.95pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>5/12<o:p></o:p></span></p></td></tr><tr><td width=220 style='width:164.95pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>1878B<o:p></o:p></span></p></td><td width=161 style='width:120.75pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>GBE COD<br>GBW COD<br>GB YT<o:p></o:p></span></p></td><td width=189 style='width:141.75pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>1113<br>12186<br>850<o:p></o:p></span></p></td><td width=126 style='width:94.55pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>Package<br>$10,000<o:p></o:p></span></p></td><td width=168 style='width:125.95pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>5/12<o:p></o:p></span></p></td></tr></table><p class=MsoNormal><o:p>&nbsp;</o:p></p><p class=MsoNormal><o:p>&nbsp;</o:p></p><p class=MsoNormal>David Leveille<o:p></o:p></p><p class=MsoNormal>II Northeast Fishery Sector Inc.<o:p></o:p></p><p class=MsoNormal>10 Witham Street<o:p></o:p></p><p class=MsoNormal>Gloucester, MA. 01930<o:p></o:p></p><p class=MsoNormal>Cell 978 375 3509<o:p></o:p></p><p class=MsoNormal>Fax 978 281 1555<o:p></o:p></p><p class=MsoNormal>Web <a href="http://nefs2.com/">http://nefs2.com/</a><o:p></o:p></p><p class=MsoNormal><o:p>&nbsp;</o:p></p><div class=MsoNormal align=center style='text-align:center'><span style='font-size:12.0pt;font-family:"Times New Roman","serif"'></body></html> 
</body> 
</html> 
+1

. 또한 어떻게 2 열의 가격을 원하십니까? – Shijo

+0

사용 가능한 할당량 열에 한정된 양의 종이 있습니다. 이유는 단지 7 또는 8을 인쇄 한 다음 "..."을 넣는 이유가 무엇인지 알지 못합니다. 그리고 제 2 행의 가격 같이 보일 것입니다; 그들은 모두 해당 할당량과 일치해야합니다 – theprowler

+1

이제는 더 감사합니다 :) – Shijo

답변

2

이 코드를 작동 각 세포를 읽어 데이터 프레임에 다음 목록과 목록을 만듭니다. 이 코드는 행의 모든 ​​셀에서 항목 수가 동일한 경우에만 작동합니다..

from bs4 import BeautifulSoup, NavigableString, Tag 
import pandas as pd 
import numpy as np 
def celltext(cell): 
    '''  
     textlist=[] 
     for br in cell.findAll('br'): 
      next = br.nextSibling 
      if not (next and isinstance(next,NavigableString)): 
       continue 
      next2 = next.nextSibling 
      if next2 and isinstance(next2,Tag) and next2.name == 'br': 
       text = str(next).strip() 
       if text: 
        textlist.append(next) 
     return (textlist) 
    ''' 
    textlist=[] 
    y = cell.find('span') 
    for a in y.childGenerator(): 
     if isinstance(a, NavigableString): 
      textlist.append(str(a)) 
    return (textlist) 

html=open('patht\to\html.html','r').read() 
soup = BeautifulSoup(html, 'lxml') # Parse the HTML as a string 
table = soup.find_all('table')[1] # Grab the second table 

df_Quota = pd.DataFrame() 

for row in table.find_all('tr'):  
    columns = row.find_all('td') 
    if columns[0].get_text().strip()<>'ID': # skip header 
     Quota = celltext(columns[1]) 
     Weight = celltext(columns[2]) 
     price = celltext(columns[3]) 

     Nrows= max([len(Quota),len(Weight),len(price)]) #get the max number of rows 

     IDList = [columns[0].get_text()] * Nrows 
     DateList = [columns[4].get_text()] * Nrows 

     if price[0].strip()=='Package': 
      price = [columns[3].get_text()] * Nrows 

     if len(Quota)<len(Weight): #if Quota has less itmes extened with nan 
      lstnans= [np.nan]*(len(Weight)-len(Quota)) 
      Quota.extend(lstnans) 

     FinalDataframe = pd.DataFrame(
     { 
     'ID':IDList,  
     'AvailableQuota': Quota, 
     'LiveWeightPounds': Weight, 
     'price':price, 
     'DatePosted':DateList 
     }) 
    df_Quota= df_Quota.append(FinalDataframe) 
print df_Quota 

출력이 많은 텍스트 값이 '사용 가능한 할당량', 내가 하나 이상의 텍스트를 볼 수 있습니까에있을 것입니다 방법을 결정하려면 어떻게

AvailableQuota DatePosted  ID LiveWeightPounds   price 
0  GOM COD  5/12 1878A    6188   $1.95 
1  GOM HADD  5/12 1878A    635   $1.35 
2   SNE BB  5/12 1878A    3916   $0.50 
3   GOM BB  5/12 1878A    7873   $0.50 
4   GB BB  5/12 1878A    6762   $0.20 
5  GREYSOLE  5/12 1878A    3358   $1.40 
6   GOM YT  5/12 1878A    9776   $1.20 
7   SNE YT  5/12 1878A    271   $0.50 
8   POLL  5/12 1878A   186550   $0.01 
0  GOM COD  5/20 1724    2328 Package $9,000 
1  GOM HADD  5/20 1724    445 Package $9,000 
2   GOM BB  5/20 1724    3007 Package $9,000 
3  GREYSOLE  5/20 1724    850 Package $9,000 
4   DABS  5/20 1724    3101 Package $9,000 
5   GOM YT  5/20 1724    1995 Package $9,000 
0  GBE COD  5/20 1578    538   $1.00 
1  GBW COD  5/20 1578    5894   $0.40 
2   GB BB  5/20 1578    1755   $0.20 
3   GB YT  5/20 1578    243   $1.00 
4   SNE BB  5/20 1578    490   $0.45 
5   SNE YT  5/20 1578    153   $0.50 
6   GOM BB  5/20 1578    3965   $0.15 
7   Whake  5/20 1578    2727   $0.20 
8   POLL  5/20 1578    9227   $0.01 
9   RED  5/20 1578   15060   $0.01 
0  GBE COD  5/20 310    825 Package $15,000 
1  GBW COD  5/20 310    9033 Package $15,000 
2   DABS  5/20 310    1241 Package $15,000 
3   WHAKE  5/20 310    3120 Package $15,000 
4   POLL  5/20 310   65234 Package $15,000 
5   RED  5/20 310   76610 Package $15,000 
6   SNE BB  5/20 310    1688 Package $15,000 
7   GOM BB  5/20 310    1195 Package $15,000 
8   NaN  5/20 310    2121 Package $15,000 
9   NaN  5/20 310    7285 Package $15,000 
0   SNE BB  5/7 347   8,000   $0.50 
0  GOM COD  5/12 1878A    6188   $1.95 
1  GOM HADD  5/12 1878A    635   $1.35 
2   SNE BB  5/12 1878A    3916   $0.50 
3   GOM BB  5/12 1878A    7873   $0.50 
4   GB BB  5/12 1878A    6762   $0.20 
5  GREYSOLE  5/12 1878A    3358   $1.40 
6   GOM YT  5/12 1878A    9776   $1.20 
7   SNE YT  5/12 1878A    271   $0.50 
8   POLL  5/12 1878A   186550   $0.01 
0  GBE COD  5/12 1878B    1113 Package$10,000 
1  GBW COD  5/12 1878B   12186 Package$10,000 
2   GB YT  5/12 1878B    850 Package$10,000 
+0

와우! 멋지다! – MaxU

+0

와우는 절대 완벽하게 보입니다. 가장 최근 날짜를 제외하고 모든 날짜를 제외하는 것이 쉬울까요? 이 경우 5/20에 해당하는 데이터 만 유지합니까? – theprowler

+1

주어진 예제에서 html 행 3의 항목 수가 다르면 'Available Quota'열에 8 개의 항목이 있고 'Live Weight Pounds'에는 10 개의 항목이 있습니다. 당신은 그것들을 어떻게 관리합니까? – Shijo

관련 문제