2013-04-14 3 views
1

팬더에서 pivot_table을 사용하여 원하는 출력 (아래 주어진) 또는 다음 데이터 집합과 비슷한 것을 얻을 수 있습니까? 내가 좋아하는 일을하려고 오전 :각 열마다 팬다 피봇 테이블 부분합

pivot_table(df, rows=['region'], cols=['area','distributor','salesrep'], 
      aggfunc=np.sum, margins=True).stack(['area','distributor','salesrep']) 

를하지만 그때 난 단지 면적 당 합계를 얻을 것이다 행 COLS에서 영역을 이동하는 경우 난 단지, 지역 당 합계를 얻고있다.

데이터 집합 :

 
region area   distributor  salesrep  sales invoice_count 
Central Butterworth  HIN MARKETING TLS   500  25 
Central Butterworth  HIN MARKETING TLS   500  25 
Central Butterworth  HIN MARKETING OSE   500  25 
Central Butterworth  HIN MARKETING OSE   500  25 
Central Butterworth  KWANG HENGG  TCS   500  25 
Central Butterworth  KWANG HENGG  TCS   500  25 
Central Butterworth  KWANG HENG  LBH   500  25 
Central Butterworth  KWANG HENG  LBH   500  25 
Central Ipoh   SGH EDERAN  CHAN   500  25 
Central Ipoh   SGH EDERAN  CHAN   500  25 
Central Ipoh   SGH EDERAN  KAMACHI  500  25 
Central Ipoh   SGH EDERAN  KAMACHI  500  25 
Central Ipoh   CORE SYN  LILIAN   500  25 
Central Ipoh   CORE SYN  LILIAN   500  25 
Central Ipoh   CORE SYN  TEOH   500  25 
Central Ipoh   CORE SYN  TEOH   500  25 
East  JB    LEI WAH   NF05   500  25 
East  JB    LEI WAH   NF05   500  25 
East  JB    LEI WAH   NF06   500  25 
East  JB    LEI WAH   NF06   500  25 
East  JB    WONDER F&B  SEREN   500  25 
East  JB    WONDER F&B  SEREN   500  25 
East  JB    WONDER F&B  MONC   500  25 
East  JB    WONDER F&B  MONC   500  25 
East  PJ    PENGEDAR  NORM   500  25 
East  PJ    PENGEDAR  NORM   500  25 
East  PJ    PENGEDAR  SIMON   500  25 
East  PJ    PENGEDAR  SIMON   500  25 
East  PJ    HEBAT   OGI   500  25 
East  PJ    HEBAT   OGI   500  25 
East  PJ    HEBAT   MIGI   500  25 
East  PJ    HEBAT   MIGI   500  25 

원하는 출력 :

 
region  area   distributor  salesrep    invoice_count sales 
Grand Total                 800 16000 
Central  Central Total             400 8000 
Central  Butterworth Butterworth Total        200 4000 
Central  Butterworth HIN MARKETING  HIN MARKETING Total   100 2000 
Central  Butterworth HIN MARKETING  OSE        50 1000 
Central  Butterworth HIN MARKETING  TLS        50 1000 
Central  Butterworth KWANG HENG  KWANG HENG Total    100 2000 
Central  Butterworth KWANG HENG  LBH        50 1000 
Central  Butterworth KWANG HENG  TCS        50 1000 
Central  Ipoh   Ipoh Total          200 4000 
Central  Ipoh   CORE SYN   CORE SYN Total     100 2000 
Central  Ipoh   CORE SYN   LILIAN       50 1000 
Central  Ipoh   CORE SYN   TEOH       50 1000 
Central  Ipoh   SGH EDERAN  SGH EDERAN Total    100 2000 
Central  Ipoh   SGH EDERAN  CHAN       50 1000 
Central  Ipoh   SGH EDERAN  KAMACHI       50 1000 
East   East Total              400 8000 
East   JB   JB Total           200 4000 
East   JB   LEI WAH   LEI WAH Total     100 2000 
East   JB   LEI WAH   NF05       50 1000 
East   JB   LEI WAH   NF06       50 1000 
East   JB   WONDER F&B  WONDER F&B Total    100 2000 
East   JB   WONDER F&B  MONC       50 1000 
East   JB   WONDER F&B  SEREN       50 1000 
East   PJ   PJ Total           200 4000 
East   PJ   HEBAT    HEBAT Total     100 2000 
East   PJ   HEBAT    MIGI       50 1000 
East   PJ   HEBAT    OGI        50 1000 
East   PJ   PENGEDAR   PENDEGAR Total     100 2000 
East   PJ   PENGEDAR   NORM       50 1000 
East   PJ   PENGEDAR   SIMON       50 1000 

답변

0

나는 테이블 내부에 소계를 얻는 방법을 모르지만, 당신이

df.pivot_table(rows=['region','area','distributor','salesrep'], 
    aggfunc=np.sum, margins=True) 

를 실행하는 경우

을 얻을 것이다
          invoice_count sales 
region area  distributor salesrep      
Central Butterworth HIN MARKETING OSE     50 1000 
            TLS     50 1000 
        KWANG HENG LBH     50 1000 
        KWANG HENGG TCS     50 1000 
     Ipoh  CORE SYN  LILIAN    50 1000 
            TEOH     50 1000 
        SGH EDERAN CHAN     50 1000 
            KAMACHI    50 1000 
East JB   LEI WAH  NF05     50 1000 
            NF06     50 1000 
        WONDER F&B MONC     50 1000 
            SEREN    50 1000 
     PJ   HEBAT   MIGI     50 1000 
            OGI     50 1000 
        PENGEDAR  NORM     50 1000 
            SIMON    50 1000 
All             800 16000 
당신이 regionarea 말을 기준으로 합계를 원하는 경우 16,

, 당신은

     invoice_count sales 
region area        
Central Butterworth   200 4000 
     Ipoh     200 4000 
East JB      200 4000 
     PJ      200 4000 
All       800 16000 
+0

감사합니다. 각 계층을 반복하고 pivot_table을 적용하여 원하는 결과를 얻으십시오. – ogi

1

결과

df.pivot_table(rows=['region', 'area'], aggfunc=np.sum, margins=True) 

우리는 pivot_table 대신 groupby을 사용할 수 실행할 수 있습니다 :

import numpy as np 
import pandas as pd 


def label(ser): 
    return '{s} Total'.format(s=ser) 

filename = 'data.txt' 
df = pd.read_table(filename, delimiter='\t') 

total = pd.DataFrame({'region': ['Grand Total'], 
         'invoice_count': df['invoice_count'].sum(), 
         'sales': df['sales'].sum()}) 
total['total_rank'] = 1 

region_total = df.groupby(['region'], as_index=False).sum() 
region_total['area'] = region_total['region'].apply(label) 
region_total['region_rank'] = 1 

area_total = df.groupby(['region', 'area'], as_index=False).sum() 
area_total['distributor'] = area_total['area'].apply(label) 
area_total['area_rank'] = 1 

dist_total = df.groupby(
    ['region', 'area', 'distributor'], as_index=False).sum() 
dist_total['salesrep'] = dist_total['distributor'].apply(label) 

rep_total = df.groupby(
    ['region', 'area', 'distributor', 'salesrep'], as_index=False).sum() 

# UNION the DataFrames into one DataFrame 
result = pd.concat([total, region_total, area_total, dist_total, rep_total]) 

# Replace NaNs with empty strings 
result.fillna({'region': '', 'area': '', 'distributor': '', 'salesrep': 
       ''}, inplace=True) 

# Reorder the rows 
sorter = np.lexsort((
    result['distributor'].rank(), 
    result['area_rank'].rank(), 
    result['area'].rank(), 
    result['region_rank'].rank(), 
    result['region'].rank(), 
    result['total_rank'].rank())) 
result = result.take(sorter) 
result = result.reindex(
    columns=['region', 'area', 'distributor', 'salesrep', 'invoice_count', 'sales']) 
print(result.to_string(index=False)) 

수율을

 region   area  distributor    salesrep invoice_count sales 
Grand Total                 800 16000 
    Central Central Total             400 8000 
    Central Butterworth Butterworth Total         200 4000 
    Central Butterworth  HIN MARKETING HIN MARKETING Total   100 2000 
    Central Butterworth  HIN MARKETING     OSE    50 1000 
    Central Butterworth  HIN MARKETING     TLS    50 1000 
    Central Butterworth   KWANG HENG  KWANG HENG Total   100 2000 
    Central Butterworth   KWANG HENG     LBH    50 1000 
    Central Butterworth   KWANG HENG     TCS    50 1000 
    Central   Ipoh   Ipoh Total         200 4000 
    Central   Ipoh   CORE SYN  CORE SYN Total   100 2000 
    Central   Ipoh   CORE SYN    LILIAN    50 1000 
    Central   Ipoh   CORE SYN     TEOH    50 1000 
    Central   Ipoh   SGH EDERAN  SGH EDERAN Total   100 2000 
    Central   Ipoh   SGH EDERAN     CHAN    50 1000 
    Central   Ipoh   SGH EDERAN    KAMACHI    50 1000 
     East  East Total             400 8000 
     East    JB   JB Total         200 4000 
     East    JB   LEI WAH  LEI WAH Total   100 2000 
     East    JB   LEI WAH     NF05    50 1000 
     East    JB   LEI WAH     NF06    50 1000 
     East    JB   WONDER F&B  WONDER F&B Total   100 2000 
     East    JB   WONDER F&B     MONC    50 1000 
     East    JB   WONDER F&B    SEREN    50 1000 
     East    PJ   PJ Total         200 4000 
     East    PJ    HEBAT   HEBAT Total   100 2000 
     East    PJ    HEBAT     MIGI    50 1000 
     East    PJ    HEBAT     OGI    50 1000 
     East    PJ   PENGEDAR  PENGEDAR Total   100 2000 
     East    PJ   PENGEDAR     NORM    50 1000 
     East    PJ   PENGEDAR    SIMON    50 1000