2017-01-24 1 views
1

의 범주 값을 갖는 순서를 변경할 수있는 방법 :나는 변수가 나는 다음과 같은 dataframe에 도달하기 위해 다음과 같이 몇 가지 조작을 수행 한 dplyr

df 

    cluster.kmeans   variable max  mean median min  sd 
1    1  MonthlySMS 191 90.32258 71.0 8 56.83801 
2    1 SixMonthlyData 1085 567.09677 573.0 109 275.46994 
3    1 SixMonthlySMS 208 94.38710 86.0 29 56.27828 
4    1 ThreeMonthlyData 1038 563.03226 573.0 94 275.51340 
5    1 ThreeMonthlySMS 199 88.35484 76.0 6 59.15491 
6    2  MonthlySMS 155 53.18815 57.0 1 31.64533 
7    2 SixMonthlyData 574 280.27352 280.5 -48 139.75252 
8    2 SixMonthlySMS 167 57.77526 47.0 1 33.49210 
9    2 ThreeMonthlyData 548 280.89547 279.0 -11 137.54755 
10    2 ThreeMonthlySMS 149 53.68641 50.5 3 31.40001 
11    3  MonthlySMS 215 135.60202 137.0 49 34.09794 
12    3 SixMonthlyData 1046 541.76322 557.0 2 258.90622 
13    3 SixMonthlySMS 314 152.40302 152.0 27 45.55642 
14    3 ThreeMonthlyData 1064 541.50378 558.0 10 255.35560 
15    3 ThreeMonthlySMS 240 146.00756 146.0 54 37.06427 
16    4  MonthlySMS 136 49.93980 54.5 1 31.47778 
17    4 SixMonthlyData 1091 788.09365 805.0 503 145.67031 
18    4 SixMonthlySMS 190 57.50167 46.0 1 33.66157 
19    4 ThreeMonthlyData 1073 785.19398 799.5 500 142.90054 
20    4 ThreeMonthlySMS 141 50.88796 46.0 1 31.07977 

내가 이러한 문자열을 기반으로 variable 열을 주문하고 싶습니다

top.vars_kmeans 
[1] "ThreeMonthlySMS" "SixMonthlyData" "ThreeMonthlyData" 
[4] "MonthlySMS"  "SixMonthlySMS" 

나는 다음과 같이 sqldf 사용하여 할 수있는 :

library(sqldf) 
a <- c(1,2,3,4,5) 
a <- data.frame(top.vars_kmeans,a) 
a <- sqldf('select a1.* ,b1.a from "MS.DATA.STATS.KMEANS" a1 inner join a b1 
      on a1.variable=b1."top.vars_kmeans"') 
a <- sqldf('select * from a order by "cluster.kmeans",a') 
a$a <- NULL 
a 

    cluster.kmeans   variable max  mean median min  sd 
1    1 ThreeMonthlySMS 199 88.35484 76.0 6 59.15491 
2    1 SixMonthlyData 1085 567.09677 573.0 109 275.46994 
3    1 ThreeMonthlyData 1038 563.03226 573.0 94 275.51340 
4    1  MonthlySMS 191 90.32258 71.0 8 56.83801 
5    1 SixMonthlySMS 208 94.38710 86.0 29 56.27828 
6    2 ThreeMonthlySMS 149 53.68641 50.5 3 31.40001 
7    2 SixMonthlyData 574 280.27352 280.5 -48 139.75252 
8    2 ThreeMonthlyData 548 280.89547 279.0 -11 137.54755 
9    2  MonthlySMS 155 53.18815 57.0 1 31.64533 
10    2 SixMonthlySMS 167 57.77526 47.0 1 33.49210 
11    3 ThreeMonthlySMS 240 146.00756 146.0 54 37.06427 
12    3 SixMonthlyData 1046 541.76322 557.0 2 258.90622 
13    3 ThreeMonthlyData 1064 541.50378 558.0 10 255.35560 
14    3  MonthlySMS 215 135.60202 137.0 49 34.09794 
15    3 SixMonthlySMS 314 152.40302 152.0 27 45.55642 
16    4 ThreeMonthlySMS 141 50.88796 46.0 1 31.07977 
17    4 SixMonthlyData 1091 788.09365 805.0 503 145.67031 
18    4 ThreeMonthlyData 1073 785.19398 799.5 500 142.90054 
19    4  MonthlySMS 136 49.93980 54.5 1 31.47778 
20    4 SixMonthlySMS 190 57.50167 46.0 1 33.66157 

난 그냥 오전 이 놀라운 패키지의 내 이해가 향상 될 것입니다 ....

도움이 필요 여기!

+0

을 그것은이 질문에 속는 아니다 @Cath. – akrun

답변

1

우리는 원하는 순서 수준에 당신이 factor (또는 ordered 요인) 재정의 할 수 match

library(dplyr) 
a %>% 
    arrange(cluster.kmeans, match(variable, top.vars_kmeans)) 
# cluster.kmeans   variable max  mean median min  sd 
#1    1 ThreeMonthlySMS 199 88.35484 76.0 6 59.15491 
#2    1 SixMonthlyData 1085 567.09677 573.0 109 275.46994 
#3    1 ThreeMonthlyData 1038 563.03226 573.0 94 275.51340 
#4    1  MonthlySMS 191 90.32258 71.0 8 56.83801 
#5    1 SixMonthlySMS 208 94.38710 86.0 29 56.27828 
#6    2 ThreeMonthlySMS 149 53.68641 50.5 3 31.40001 
#7    2 SixMonthlyData 574 280.27352 280.5 -48 139.75252 
#8    2 ThreeMonthlyData 548 280.89547 279.0 -11 137.54755 
#9    2  MonthlySMS 155 53.18815 57.0 1 31.64533 
#10    2 SixMonthlySMS 167 57.77526 47.0 1 33.49210 
#11    3 ThreeMonthlySMS 240 146.00756 146.0 54 37.06427 
#12    3 SixMonthlyData 1046 541.76322 557.0 2 258.90622 
#13    3 ThreeMonthlyData 1064 541.50378 558.0 10 255.35560 
#14    3  MonthlySMS 215 135.60202 137.0 49 34.09794 
#15    3 SixMonthlySMS 314 152.40302 152.0 27 45.55642 
#16    4 ThreeMonthlySMS 141 50.88796 46.0 1 31.07977 
#17    4 SixMonthlyData 1091 788.09365 805.0 503 145.67031 
#18    4 ThreeMonthlyData 1073 785.19398 799.5 500 142.90054 
#19    4  MonthlySMS 136 49.93980 54.5 1 31.47778 
#20    4 SixMonthlySMS 190 57.50167 46.0 1 33.66157 
1

arrange을 사용할 수 있습니다 (예를 들어 top.vars_kmeans에 저장된) :

a$variable <- factor(a$variable, levels = top.vars_kmeans) 

참조도 help 온라인 또는 ?factor을 통해 확인하십시오.

data.frame 전체를 주문하려면 akrun의 대답으로 이동하십시오.

1

당신은 group_byslice을 시도 할 수 있습니다 :

df %>% group_by(cluster.kmeans) %>% slice(match(top.vars_kmeans, variable)) 

# cluster.kmeans   variable max  mean median min  sd 
#   (int)   (fctr) (int)  (dbl) (dbl) (int)  (dbl) 
#1    1 ThreeMonthlySMS 199 88.35484 76.0  6 59.15491 
#2    1 SixMonthlyData 1085 567.09677 573.0 109 275.46994 
#3    1 ThreeMonthlyData 1038 563.03226 573.0 94 275.51340 
#4    1  MonthlySMS 191 90.32258 71.0  8 56.83801 
#5    1 SixMonthlySMS 208 94.38710 86.0 29 56.27828 
#6    2 ThreeMonthlySMS 149 53.68641 50.5  3 31.40001 
#7    2 SixMonthlyData 574 280.27352 280.5 -48 139.75252 
#8    2 ThreeMonthlyData 548 280.89547 279.0 -11 137.54755 
#9    2  MonthlySMS 155 53.18815 57.0  1 31.64533 
#10    2 SixMonthlySMS 167 57.77526 47.0  1 33.49210 
#11    3 ThreeMonthlySMS 240 146.00756 146.0 54 37.06427 
#12    3 SixMonthlyData 1046 541.76322 557.0  2 258.90622 
#13    3 ThreeMonthlyData 1064 541.50378 558.0 10 255.35560 
#14    3  MonthlySMS 215 135.60202 137.0 49 34.09794 
#15    3 SixMonthlySMS 314 152.40302 152.0 27 45.55642 
#16    4 ThreeMonthlySMS 141 50.88796 46.0  1 31.07977 
#17    4 SixMonthlyData 1091 788.09365 805.0 503 145.67031 
#18    4 ThreeMonthlyData 1073 785.19398 799.5 500 142.90054 
#19    4  MonthlySMS 136 49.93980 54.5  1 31.47778 
#20    4 SixMonthlySMS 190 57.50167 46.0  1 33.66157