2017-05-16 2 views
1

다음과 같은 데이터 프레임이 있으며 sic2 열의 값에 따라 'string'을 삽입하고 싶습니다.pandas dataframe 다른 열 값의 범위에 따라 값을 삽입하십시오.

 conm   sic2 
115466 ALLEGION PLC 34.0 
115471 AGILITY HEALTH INC 80.0 
115473 NORDIC AMERICAN OFFSHORE 44.0 
115474 AAD    54.0 
115477 DORIAN LPG LTD 44.0 
115484 NOMAD FOODS LTD 20.0 
115486 ATHENE HOLDING LTD 63.0 
115490 MIDATECH PHARMA PLC 28.0 
115495 MOTIF BIO PLC 28.0 

문자열로 sic2 숫자 범위는 아래와 같다.

1-9 Agriculture, Forestry and Fishing 
10-14 Mining 
15-17 Construction 
18-19 not used 
20-39 Manufacturing 
40-49 Transportation, Communications, Electric, Gas and Sanitary service 
50-51 Wholesale Trade 
52-59 Retail Trade 
60-67 Finance, Insurance and Real Estate 
70-89 Services 
91-97 Public Administration 
99-99 Nonclassifiable 
0 -1 Agricultural Production-Crops 

어떻게이 적용 전체 대형 데이터 세트처럼 보이는 pandas.DataFrame을 만들 수 있습니까?

여러 조건 코드를 시도했지만 실패했습니다.

 conm   sic2    industry 
115466 ALLEGION PLC 34.0    Manufacturing 
115471 AGILITY HEALTH INC 80.0   Services 
115473 NORDIC AMERICAN OFFSHORE 44.0 Transportation, Communications, Electric, Gas and Sanitary service 
115474 AAD    54.0    Retail Trade 

답변

2

당신이 사전에 sics 번호를 설정 한 경우 다음 필요에 따라 산업 조회하는 매우 정직 :

코드 :

sic = [x.strip().split(' ', 1) for x in """ 
    1-9 Agriculture, Forestry and Fishing 
    10-14 Mining 
    15-17 Construction 
    18-19 not used 
    20-39 Manufacturing 
    40-49 Transportation, Communications, ... 
    50-51 Wholesale Trade 
    52-59 Retail Trade 
    60-67 Finance, Insurance and Real Estate 
    70-89 Services 
    91-97 Public Administration 
    99-99 Nonclassifiable 
""".split('\n')[1:-1]] 

sic_dict = dict(sum([[(x, z) for x in 
         range(*[int(y) for y in v.split('-')])] 
        for v, z in sic], [])) 

시험 코드 :

df = pd.read_fwf(StringIO(u""" 
    number conm      sic2 
    115466 ALLEGION PLC    34.0 
    115471 AGILITY HEALTH INC  80.0 
    115473 NORDIC AMERICAN OFFSHORE 44.0 
    115474 AAD      54.0 
    115477 DORIAN LPG LTD   44.0 
    115484 NOMAD FOODS LTD   20.0 
    115486 ATHENE HOLDING LTD  63.0 
    115490 MIDATECH PHARMA PLC  28.0 
    115495 MOTIF BIO PLC    28.0"""), header=1) 

df['industry'] = df.sic2.apply(lambda x: sic_dict[int(x)]) 

print(df) 

결과 :

number      conm sic2        industry 
0 115466    ALLEGION PLC 34.0      Manufacturing 
1 115471  AGILITY HEALTH INC 80.0        Services 
2 115473 NORDIC AMERICAN OFFSHORE 44.0 Transportation, Communications, ... 
3 115474      AAD 54.0       Retail Trade 
4 115477   DORIAN LPG LTD 44.0 Transportation, Communications, ... 
5 115484   NOMAD FOODS LTD 20.0      Manufacturing 
6 115486  ATHENE HOLDING LTD 63.0 Finance, Insurance and Real Estate 
7 115490  MIDATECH PHARMA PLC 28.0      Manufacturing 
8 115495    MOTIF BIO PLC 28.0      Manufacturing 
0
#Save your mapping table to a data frame 

df2 = pd.DataFrame({'id_end': {0: 9, 1: 14, 2: 17, 3: 19, 4: 39, 5: 49, 6: 51, 7: 59, 8: 67, 9: 89, 10: 97, 11: 99, 12: 1}, 
'id_start': {0: 1, 1: 10, 2: 15, 3: 18, 4: 20, 5: 40, 6: 50, 7: 52, 8: 60, 9: 70, 10: 91, 11: 99, 12: 0}, 
'industry': {0: 'Agriculture, Forestry and Fishing', 1: 'Mining', 2: 'Construction', 3: 'not used', 4: 'Manufacturing', 
    5: 'Transportation, Communications, Electric, Gas and Sanitary service', 
    6: 'Wholesale Trade', 7: 'Retail Trade', 8: 'Finance, Insurance and Real Estate', 9: 'Services', 
    10: 'Public Administration', 11: 'Nonclassifiable', 12: 'Agricultural Production Crops'}}) 

df2 = df2.sort_values(by='id_end') 

Out[354]: 
    id_end id_start           industry 
12  1   0      Agricultural Production Crops 
0  9   1     Agriculture, Forestry and Fishing 
1  14  10            Mining 
2  17  15          Construction 
3  19  18           not used 
4  39  20          Manufacturing 
5  49  40 Transportation, Communications, Electric, Gas ... 
6  51  50         Wholesale Trade 
7  59  52          Retail Trade 
8  67  60     Finance, Insurance and Real Estate 
9  89  70           Services 
10  97  91        Public Administration 
11  99  99         Nonclassifiable 

#Map sic2 number to industry names 
df['industry'] = df['sic2'].astype(np.int).apply(lambda x: df2.loc[df2.id_end>=x,'industry'].iloc[0]) 


Out[352]: 
          conm sic2            industry 
115466    ALLEGION PLC 34.0          Manufacturing 
115471  AGILITY HEALTH INC 80.0            Services 
115473 NORDIC AMERICAN OFFSHORE 44.0 Transportation, Communications, Electric, Gas ... 
115474      AAD 54.0           Retail Trade 
115477   DORIAN LPG LTD 44.0 Transportation, Communications, Electric, Gas ... 
115484   NOMAD FOODS LTD 20.0          Manufacturing 
115486  ATHENE HOLDING LTD 63.0     Finance, Insurance and Real Estate 
115490  MIDATECH PHARMA PLC 28.0          Manufacturing 
115495    MOTIF BIO PLC 28.0          Manufacturing 
관련 문제