팬더가 데이터 프레임 groupby에 적용

groupf (아래의 g 객체)를 사용하여 df의 처음 1000 행에 다음 함수를 적용하면 작동합니다. 나는 전체 DF에 적용한다면, 나는이 예외 얻을 : 여기에 원인을 무엇팬더가 데이터 프레임 groupby에 적용

def calc_load(x): 
     ...:  x.sort('log_timestamp') 
     ...:  x['time_stddev'] = x['time'].std() 
     ...:  x['time_mean'] = x['time'].mean() 
     ...:  return x 
     ...: 


    c=g.apply(calc_load) 
    --------------------------------------------------------------------------- 
    ........ 

    ValueError        Traceback (most recent call last) 
    <ipython-input-262-f2fe1f013907> in <module>() 
    ----> 1 c=g.apply(calc_load) 
     2215    tuple(map(int, [tot_items] + list(block_shape))), 
    -> 2216    tuple(map(int, [len(ax) for ax in axes])))) 
     2217 
     2218 

    ValueError: Shape of passed values is (10, 3943482), indices imply (10, 410450)

을하고 나는 그것을 어떻게 해결할 수 있습니까?

UPDATE :

이 같은 HDF5 저장소에서이 테이블을 읽고 있어요 : 나는 다음과 같은 CSV로 왕복 할 경우

prob2 
Out[374]: 
<class 'pandas.io.pytables.HDFStore'> 
File path: /tmp/test2.h5 
/mytable   frame_table (typ->appendable,nrows->410450,ncols->8,indexers->[index]) 

a=prob2.mytable 

a 
Out[376]: 
<class 'pandas.core.frame.DataFrame'> 
Int64Index: 410450 entries, 0 to 9999 
Data columns (total 8 columns): 
args    410450 non-null values 
host    410450 non-null values 
kwargs   410450 non-null values 
log_timestamp 410450 non-null values 
operation  410450 non-null values 
slot    410450 non-null values 
status   410450 non-null values 
time    410450 non-null values 
dtypes: float64(1), int64(2), object(5)

는, 예외가 발생하지 않습니다

a.to_csv('/tmp/test2.csv') 

b=pd.read_csv('/tmp/test2.csv') 

b 
Out[379]: 
<class 'pandas.core.frame.DataFrame'> 
Int64Index: 410450 entries, 0 to 410449 
Data columns (total 9 columns): 
Unnamed: 0  410450 non-null values 
args    410450 non-null values 
host    410450 non-null values 
kwargs   410450 non-null values 
log_timestamp 410450 non-null values 
operation  410450 non-null values 
slot    410450 non-null values 
status   410450 non-null values 
time    410450 non-null values 
dtypes: float64(1), int64(3), object(5) 

bg = b.groupby(['host','operation']) 

bg.apply(calc_load) 
Out[381]: 
<class 'pandas.core.frame.DataFrame'> 
Int64Index: 410450 entries, 0 to 410449 
Data columns (total 11 columns): 
Unnamed: 0  410450 non-null values 
args    410450 non-null values 
host    410450 non-null values 
kwargs   410450 non-null values 
log_timestamp 410450 non-null values 
operation  410450 non-null values 
slot    410450 non-null values 
status   410450 non-null values 
time    410450 non-null values 
time_stddev  410371 non-null values 
time_mean  410450 non-null values 
dtypes: float64(3), int64(3), object(5)

왕복 (a) 전과 왕복 (b) 전의 데이터 프레임은 유사하지만 모양이 동일하지 않습니다!

a 
Out[386]: 
<class 'pandas.core.frame.DataFrame'> 
Int64Index: 410450 entries, 0 to 9999 
Data columns (total 8 columns): 
args    410450 non-null values 
host    410450 non-null values 
kwargs   410450 non-null values 
log_timestamp 410450 non-null values 
operation  410450 non-null values 
slot    410450 non-null values 
status   410450 non-null values 
time    410450 non-null values 
dtypes: float64(1), int64(2), object(5) 



b 
Out[387]: 
<class 'pandas.core.frame.DataFrame'> 
Int64Index: 410450 entries, 0 to 410449 
Data columns (total 9 columns): 
Unnamed: 0  410450 non-null values 
args    410450 non-null values 
host    410450 non-null values 
kwargs   410450 non-null values 
log_timestamp 410450 non-null values 
operation  410450 non-null values 
slot    410450 non-null values 
status   410450 non-null values 
time    410450 non-null values 
dtypes: float64(1), int64(3), object(5)

어쨌든, 여기 무슨 일 이니?

출처

2013-12-19 LetMeSOThat4U

당신은 어쩌면, 작업 예제를 제공하여 프레임을 공급하기 위해 드롭 박스를 사용 (또는 오류를 표시하는 예제를 만들) – Jeff

@Jeff는, 그것은 UPDATE에의 할 필요가있다. 그리고 모든 도움을 주셔서 감사합니다! – LetMeSOThat4U

은''df.head()''를 할 수 있으므로 값을 볼 수 있습니다. dtype이라는 문자열과 같은 열 (표시)이있는 것 같습니다. Object dtypes는 문자열과 유사해야합니다. HDF5에 넣기 전에 일부 전환을해야 할 수도 있습니다. 그 단계 이전에 데이터는 어디에서 왔습니까? – Jeff

호스트/작업별로 그룹화 한 후에는 많은 중복이 있습니다. 이것은 처음 1000 행의 prob가 작동하지만 전체 집합은 작동하지 않는 이유입니다.

먼저 색인을 재설정하고 그룹화하여 적용하십시오. 색인을 끝까지 설정하여 원래 색인을 복구 할 수 있습니다. 재설정 색인은 'index'라는 열로 바뀝니다 (set_index가 삭제됨).

실제로 이것은 매우 일반적인 패턴입니다. 더 유용한 오류 메시지가 순서대로 표시 될 수 있습니다 (here 참조). groupby가이 문제를 자동 수정해야하는지 잘 모르겠다. 사용자 오류 또는 의도 일 수 있습니다.

In [26]: df = d.reset_index().groupby(['host','operation']).apply(calc_load).set_index('index') 

In [27]: df 
Out[27]: 
     args   host kwargs  log_timestamp    operation  slot status  time time_stddev time_mean 
index                               
0  [] yy3.segm1.org  {} 1385984306000000000  x_gWidgboxParams a12yy3 -101 0.000477  0.061657 0.003226 
1  [] yy14.segm1.org  {} 1385984306000000000   x_initWidgbox a11yy14  1 0.004177  0.035759 0.005816 
10  [] yy32.segm1.org  {} 1385984307000000000    gSettings a13yy32 -101 0.009686  0.245170 0.070137 
100  [] yy19.segm1.org  {} 1385984308000000000 notifyTestsDelivered a16yy19  1 0.000766  0.002825 0.000964 
1000 [] yy7.segm1.org  {} 1385984320000000000   addWidging2 a12yy7  1 0.002576  0.008525 0.004122 
10000 [] yy14.segm1.org  {} 1385984461000000000   addWidging2 a13yy14  1 0.001317  0.009431 0.003910 
10001 [] yy14.segm1.org  {} 1385984461000000000    gxyzinf a13yy14 -101 0.000542  0.001861 0.001074 
10002 [] yy20.segm1.org  {} 1385984461000000000    x_gbinf I502yy20 -101 0.000522  0.001043 0.000743 
10003 [] yy20.segm1.org  {} 1385984461000000000  setFlagsOneWidg I502yy20  1 0.001660  0.005404 0.002910 
10004 [] yy14.segm1.org  {} 1385984461000000000 notifyTestsDelivered a13yy14  1 0.000551  0.002877 0.001156 
10005 [] yy20.segm1.org  {} 1385984461000000000    gxyzinf I502yy20 -101 0.000521  0.000802 0.000813 
10006 [] yy14.segm1.org  {} 1385984461000000000   addWidging2 a13yy14  1 0.001256  0.009431 0.003910 
10007 [] yy14.segm1.org  {} 1385984461000000000    gxyzinf a13yy14 -101 0.000414  0.001861 0.001074 
10008 [] yy14.segm1.org  {} 1385984461000000000   addWidging2 a13yy14  1 0.001222  0.009431 0.003910 
10009 [] yy14.segm1.org  {} 1385984461000000000    gxyzinf a13yy14 -101 0.000475  0.001861 0.001074 
1001 [] yy7.segm1.org  {} 1385984320000000000    gxyzinf a12yy7 -101 0.000783  0.003059 0.001004 
10010 [] yy14.segm1.org  {} 1385984461000000000   x_initWidgbox a12yy14  1 0.002764  0.035759 0.005816 
10011 [] yy32.segm1.org  {} 1385984461000000000   x_initWidgbox a15yy32  1 0.057966  0.334923 0.147668 
10012 [] yy3.segm1.org  {} 1385984461000000000    gSettings a11yy3 -101 0.006519  0.163707 0.017649 
10013 [] yy30.segm1.org  {} 1385984461000000000    gtfull a13yy30 -101 0.003648  0.116366 0.014088 
10014 [] yy6.segm1.org  {} 1385984461000000000    x_gbinf a16yy6 -101 0.000621  0.005796 0.001139 
10015 [] yy34.segm1.org  {} 1385984461000000000    gtfull a14yy34 -101 0.002031  0.015581 0.007747 
10016 [] yy34.segm1.org  {} 1385984461000000000    x_gbinf a14yy34 -101 0.000546  0.002596 0.001899 
10017 [] yy34.segm1.org  {} 1385984461000000000  setFlagsOneWidg a14yy34  1 0.001358  0.003515 0.005866 
10018 [] yy34.segm1.org  {} 1385984461000000000    gxyzinf a14yy34 -101 0.000486  0.004446 0.002018 
10019 [] yy25.segm1.org  {} 1385984461000000000    gtfull a13yy25 -101 0.002029  0.001793 0.002355 
1002 [] yy7.segm1.org  {} 1385984320000000000 notifyTestsDelivered a12yy7  1 0.000847  0.003748 0.001081 
10020 [] yy32.segm1.org  {} 1385984462000000000    gFolderId a15yy32 -101 0.018326  0.187434 0.058200 
10021 [] yy25.segm1.org  {} 1385984462000000000    x_gbinf a13yy25 -101 0.000589  0.001716 0.000830 
10022 [] yy25.segm1.org  {} 1385984462000000000   updateWidg a13yy25  1 0.003058  0.004660 0.003973 
10023 [] yy25.segm1.org  {} 1385984462000000000   clearElems a13yy25  1 0.000661  0.004893 0.001687 
10024 [] yy10.segm1.org  {} 1385984462000000000    gtfull a18yy10 -101 0.002779  0.069679 0.007495 
10025 [] yy13.segm1.org  {} 1385984462000000000    gtfull a11yy13 -101 0.001978  0.124069 0.012524 
10026 [] yy32.segm1.org  {} 1385984462000000000    x_gbinf a14yy32 -101 0.018674  0.190657 0.058083 
10027 [] yy10.segm1.org  {} 1385984462000000000    x_gbinf a18yy10 -101 0.000874  0.007170 0.001606 
10028 [] yy32.segm1.org  {} 1385984462000000000    gWidgId a14yy32  1 0.014523  1.518315 0.559983 
10029 [] yy13.segm1.org  {} 1385984462000000000    x_gbinf a11yy13 -101 0.000577  0.008605 0.001130 
1003 [] yy7.segm1.org  {} 1385984320000000000  x_gWidgboxParams a12yy7 -101 0.000933  0.001084 0.001442 
10030 [] yy13.segm1.org  {} 1385984462000000000  setFlagsOneWidg a11yy13  1 0.001611  0.011409 0.004093 
10031 [] yy13.segm1.org  {} 1385984462000000000    gxyzinf a11yy13 -101 0.000575  0.053991 0.003044 
10032 [] yy39.segm1.org  {} 1385984462000000000    gtfull a13yy39 -101 0.002005  0.034577 0.003504 
10033 [] yy39.segm1.org  {} 1385984462000000000    x_gbinf a13yy39 -101 0.000539  0.001371 0.000931 
10034 [] yy32.segm1.org  {} 1385984462000000000   addWidging2 a15yy32  1 0.122369  1.414068 0.441565 
10035 [] yy32.segm1.org  {} 1385984462000000000   moveOneWidg a12yy32  1 0.468481  1.303089 0.665778 
10036 [] yy32.segm1.org  {} 1385984462000000000    gxyzinf a15yy32 -101 0.018006  0.155379 0.040389 
10037 [] yy32.segm1.org  {} 1385984462000000000 notifyTestsDelivered a15yy32  1 0.006874  0.129650 0.032741 
10038 [] yy32.segm1.org  {} 1385984462000000000    gxyzinf a12yy32 -101 0.016607  0.155379 0.040389 
10039 [] yy39.segm1.org  {} 1385984462000000000   updateWidg a13yy39  1 0.003879  0.005466 0.006465 
1004 [] yy34.segm1.org  {} 1385984320000000000    gtfull a11yy34 -101 0.003681  0.015581 0.007747 
10040 [] yy39.segm1.org  {} 1385984462000000000    SELECT a13yy39 217831 0.000423  0.000126 0.000551 
10041 [] yy39.segm1.org  {} 1385984462000000000   clearElems a13yy39  1 0.000705  0.002367 0.001356 
10042 [] yy3.segm1.org  {} 1385984462000000000   moveOneWidg a15yy3  1 0.002660  0.027428 0.009078 
10043 [] yy3.segm1.org  {} 1385984462000000000    gxyzinf a15yy3 -101 0.000436  0.041627 0.001913 
10044 [] yy39.segm1.org  {} 1385984462000000000    gSettings a11yy39 -101 0.002237  0.007467 0.002679 
10045 [] yy32.segm1.org  {} 1385984462000000000    gSettings a15yy32 -101 0.012113  0.245170 0.070137 
10046 [] yy32.segm1.org  {} 1385984462000000000  x_gWidgboxParams a15yy32 -101 0.030427  0.143941 0.050055 
10047 [] yy13.segm1.org  {} 1385984462000000000   moveOneWidg a12yy13  1 0.003796  0.117085 0.017910 
10048 [] yy13.segm1.org  {} 1385984462000000000    gxyzinf a12yy13 -101 0.000521  0.053991 0.003044 
10049 [] yy30.segm1.org  {} 1385984462000000000  x_gWidgboxParams a13yy30 -101 0.002451  0.051829 0.003644 
1005 [] yy12.segm1.org  {} 1385984320000000000    gtfull a15yy12 -101 0.003428  0.005479 0.003063 
     ...    ... ...     ...     ...  ...  ...  ...   ...  ... 

[410450 rows x 10 columns]

출처

2013-12-20 13:56:52 Jeff

팬더가 데이터 프레임 groupby에 적용

답변

관련 문제