R 형태의 stemDocument를 수행 한 후 원래 형태와 함께 모든 줄기 단어의 목록을 얻는 방법

줄기 모양의 모든 단어와 원래 양식의 목록을 얻으려고합니다.R 형태의 stemDocument를 수행 한 후 원래 형태와 함께 모든 줄기 단어의 목록을 얻는 방법

는 여기에 내가이 당신을 위해 도움이 뭔가 할 수있는 데이터 프레임이 같은 대답을

orginal_word stemmed 
Impressed  Impress 
shipping  ship 
very   veri 
helpful   help 
wonderful  wonder 
experience  experi

출처

2017-09-21 Shruthi S R

을 찾고 있어요 예를 들어

library(tm) 
text <- c("Very Impressed with the shipping time, it arrived a few days earlier than expected", "it was very helpful","It was a wonderful experience") 
corpus<-Corpus(VectorSource(text)) 
corpus<-tm_map(corpus,stemDocument)

입니다. SnowballC 패키지에는 wordStem()이라는 기능이 있습니다. 그것을 사용하여, 당신은 다음을 할 수 있습니다. tidytext 패키지에 unnest_tokens()을 사용 했으므로 먼저 데이터 프레임을 만들었습니다. 이 함수는 단어를 분할하고 긴 형식의 데이터 집합을 만듭니다. 정지 단어를 제거하려는 것 같습니다. 그래서 filter()을 사용했습니다. 마지막 단계는 당신에게 중요한 단계입니다. SnowballC 패키지에 wordStem()을 사용하여 데이터에 남아있는 단어의 줄기를 추출했습니다. 결과는 정확히 원하는 결과가 아닐 수 있습니다. 그러나 이것이 어느 정도 도움이되기를 바랍니다. 이것은 좀 더 효율적인 @ jazzurro의 대답보다

library(dplyr) 
library(tidytext) 
library(SnowballC) 

mydf <- data_frame(id = 1:length(text), 
        text = text) 

data(stop_words) 

mydf %>% 
unnest_tokens(input = text, output = word) %>% 
filter(!word %in% stop_words$word) %>% 
mutate(stem = wordStem(word)) 

#  id  word stem 
# <int>  <chr> <chr> 
# 1  1 impressed impress 
# 2  1 shipping ship 
# 3  1  time time 
# 4  1 arrived arriv 
# 5  1  days  dai 
# 6  1 earlier earlier 
# 7  1 expected expect 
# 8  2 helpful help 
# 9  3 wonderful wonder 
#10  3 experience experi

출처

2017-09-21 12:07:05 jazzurro

포터 스템을 원하지 않는 한'wordStem (word, "english") 여야합니다. 메르. –

library("corpus") 
text <- c("Very Impressed with the shipping time, it arrived a few days earlier than expected", "it was very helpful","It was a wonderful experience") 
word <- text_types(text, collapse = TRUE, drop = stopwords_en, drop_punct = TRUE) 
stem <- SnowballC::wordStem(word, "english") 
data.frame(word, stem)

결과 :

  word stem 
1  arrived arriv 
2  days  day 
3  earlier earlier 
4 expected expect 
5 experience experi 
6  helpful help 
7 impressed impress 
8 shipping ship 
9  time time 
10 wonderful wonder

(즉 당신에게 중요한 경우 text_types 기능도 tm 코퍼스 객체를 받아들입니다.)

출처

2017-09-22 01:42:05

R 형태의 stemDocument를 수행 한 후 원래 형태와 함께 모든 줄기 단어의 목록을 얻는 방법

답변

관련 문제