2017-12-07 1 views
2

ngram에서 가져온 여러 텍스트의 목록을 얻었으므로 원본 datatable에 열로 추가하고 싶습니다.ngram 텍스트가 R에서 별도의 열로 표시됩니다.

> prep_test 
                          prep_test 
1:      Women Athletic,Athletic Apparel,Apparel Pants,Pants Tights,Tights Leggings 
2:                  Beauty Makeup,Makeup Face 
3:                  Beauty Makeup,Makeup Face 
4:  Electronics Cell,Cell Phones,Phones Accessories,Accessories Cases,Cases Covers,Covers Skins 
5:                   Women Shoes,Shoes Boots 
6:             Men Men,Men s,s Accessories,Accessories Belts 
7: Electronics Cell,Cell Phones,Phones Accessories,Accessories Cell,Cell Phones,Phones Smartphones 
8:               Women Tops,Tops Blouses,Blouses Other 
9:      Women Athletic,Athletic Apparel,Apparel Pants,Pants Tights,Tights Leggings 
10:            Home Home,Home DÃ,DÃ cor,cor Home,Home Fragrance 



str(prep_test) 
Classes ‘data.table’ and 'data.frame': 10 obs. of 1 variable: 
$ prep_test:List of 10 
    ..$ : chr "Women Athletic" "Athletic Apparel" "Apparel Pants" "Pants Tights" ... 
    ..$ : chr "Beauty Makeup" "Makeup Face" 
    ..$ : chr "Beauty Makeup" "Makeup Face" 
    ..$ : chr "Electronics Cell" "Cell Phones" "Phones Accessories" "Accessories Cases" ... 
    ..$ : chr "Women Shoes" "Shoes Boots" 
    ..$ : chr "Men Men" "Men s" "s Accessories" "Accessories Belts" 
    ..$ : chr "Electronics Cell" "Cell Phones" "Phones Accessories" "Accessories Cell" ... 
    ..$ : chr "Women Tops" "Tops Blouses" "Blouses Other" 
    ..$ : chr "Women Athletic" "Athletic Apparel" "Apparel Pants" "Pants Tights" ... 
    ..$ : chr "Home Home" "Home DÃ" "DÃ cor" "cor Home" ... 
- attr(*, ".internal.selfref")=<externalptr> 

현재 코드 여기

bigram_fun <- function(y){ 
    y <- gsub("[[:punct:][:blank:]]+", " ", y) 
    y <- ngram_asweka(y, min=2, max=2) 
    #y <- str_split_fixed(y, ",", n=Inf) 
    #y <- unlist(y) 
    return(y) 
} 

prep_test <- all[1:10, 9] 
prep_test <- apply(prep_test, 1, bigram_fun) 
prep_test <- data.table(prep_test) 
prep_test 

dput

> dput(prep_test) 
list(c("Women Athletic", "Athletic Apparel", "Apparel Pants", 
"Pants Tights", "Tights Leggings"), c("Beauty Makeup", "Makeup Face" 
), c("Beauty Makeup", "Makeup Face"), c("Electronics Cell", "Cell Phones", 
"Phones Accessories", "Accessories Cases", "Cases Covers", "Covers Skins" 
), c("Women Shoes", "Shoes Boots"), c("Men Men", "Men s", "s Accessories", 
"Accessories Belts"), c("Electronics Cell", "Cell Phones", "Phones Accessories", 
"Accessories Cell", "Cell Phones", "Phones Smartphones"), c("Women Tops", 
"Tops Blouses", "Blouses Other"), c("Women Athletic", "Athletic Apparel", 
"Apparel Pants", "Pants Tights", "Tights Leggings"), c("Home Home", 
"Home DÃ", "DÃ cor", "cor Home", "Home Fragrance")) 

원하는 결과 열에 대해 N-g를 생성

Bigram 1   Bigram 2   Bigram 3    Bigram 4  ... 
"Women Athletic" "Athletic Apparel" "Apparel Pants"  "Pants Tights"... 
"Beauty Makeup" "Makeup Face"  NA     NA   ... 
"Beauty Makeup" "Makeup Face"  NA     NA   ... 
"Electronics Cell" "Cell Phones"  "Phones Accessories" "Accessories Cases" 
"Women Shoes"  "Shoes Boots"  NA     NA 

이 작동합니다 여기

+0

업로드'데이터의 dput' 코드의 재현이라고 – Chris

+0

'prep_test' 귀하의 질문에 data.table 객체가 그래서. 그러나'dput'에는 데이터 테이블이 아닌 목록이 들어 있습니다. 내가 놓친 게 있니? – jazzurro

답변

0

초보자 여기 가난한 질문 죄송 어떤 답변을 주셔서 감사 드리며, :

library(plyr) 
df = rbind.fill(lapply(mylist,function(x) {as.data.frame(t(x))})) 
colnames(df) = sapply(seq(1:ncol(df)),function(x) {paste0("Bigram ",x)}) 

출력 :

  Bigram 1   Bigram 2   Bigram 3   Bigram 4  Bigram 5   Bigram 6 
1 Women Athletic Athletic Apparel  Apparel Pants  Pants Tights Tights Leggings    <NA> 
2  Beauty Makeup  Makeup Face    <NA>    <NA>   <NA>    <NA> 
3  Beauty Makeup  Makeup Face    <NA>    <NA>   <NA>    <NA> 
4 Electronics Cell  Cell Phones Phones Accessories Accessories Cases Cases Covers  Covers Skins 
5  Women Shoes  Shoes Boots    <NA>    <NA>   <NA>    <NA> 
6   Men Men   Men s  s Accessories Accessories Belts   <NA>    <NA> 
7 Electronics Cell  Cell Phones Phones Accessories Accessories Cell  Cell Phones Phones Smartphones 
8  Women Tops  Tops Blouses  Blouses Other    <NA>   <NA>    <NA> 
9 Women Athletic Athletic Apparel  Apparel Pants  Pants Tights Tights Leggings    <NA> 
10  Home Home   Home DÃ    DÃ cor   cor Home Home Fragrance    <NA> 

희망이 도움이!

+0

이 작업 중입니다! 굉장 thx 플로리안 : D –

0

bigrams를 데이터 프레임으로 변환하고 녹은 데이터 프레임에 바인딩 한 다음 다음과 같이 와이드 형식의 정리 된 데이터 파일로 캐스팅 할 수 있습니다.

theBigrams <- list(c("Women Athletic", "Athletic Apparel", "Apparel Pants", 
"Pants Tights", "Tights Leggings"), c("Beauty Makeup", "Makeup Face"), 
c("Beauty Makeup", "Makeup Face"), c("Electronics Cell", "Cell Phones", 
"Phones Accessories", "Accessories Cases", "Cases Covers", "Covers Skins" 
), c("Women Shoes", "Shoes Boots"), c("Men Men", "Men s", "s Accessories", 
"Accessories Belts"), c("Electronics Cell", "Cell Phones", "Phones Accessories", 
"Accessories Cell", "Cell Phones", "Phones Smartphones"), c("Women Tops", 
"Tops Blouses", "Blouses Other"), c("Women Athletic", "Athletic Apparel", 
"Apparel Pants", "Pants Tights", "Tights Leggings"), c("Home Home", 
"Home DÃ", "DÃ cor", "cor Home", "Home Fragrance")) 

meltedBigrams <- do.call(rbind,lapply(seq_along(theBigrams),function(i) { 
    x <- theBigrams[[i]] 
    bigram <- 1:length(x) 
    id <- rep(i,length(x)) 
    data.frame(id,bigram,value=x,stringsAsFactors=FALSE) 
})) 
library(reshape2) 
castData <- dcast(meltedBigrams,id ~ bigram) 
castData 

... 그리고 출력 :

> castData 
    id    1    2     3     4    5     6 
1 1 Women Athletic Athletic Apparel  Apparel Pants  Pants Tights Tights Leggings    <NA> 
2 2 Beauty Makeup  Makeup Face    <NA>    <NA>   <NA>    <NA> 
3 3 Beauty Makeup  Makeup Face    <NA>    <NA>   <NA>    <NA> 
4 4 Electronics Cell  Cell Phones Phones Accessories Accessories Cases Cases Covers  Covers Skins 
5 5  Women Shoes  Shoes Boots    <NA>    <NA>   <NA>    <NA> 
6 6   Men Men   Men s  s Accessories Accessories Belts   <NA>    <NA> 
7 7 Electronics Cell  Cell Phones Phones Accessories Accessories Cell  Cell Phones Phones Smartphones 
8 8  Women Tops  Tops Blouses  Blouses Other    <NA>   <NA>    <NA> 
9 9 Women Athletic Athletic Apparel  Apparel Pants  Pants Tights Tights Leggings    <NA> 
10 10  Home Home   Home DÃ    DÃ cor   cor Home Home Fragrance    <NA> 
> 
+0

Thx 렌 Greski : D 조이 작품도! 너무 많이! –

관련 문제