2016-10-22 2 views
3

NLTK을 사용하여 파이썬의 문장 철자를 검사하고 싶습니다. 내장 된 spell checker이 올바르게 작동하지 않습니다. with과 'and'를 잘못된 철자로 사용하고 있습니다.NLTK의 맞춤법 검사기가 올바르게 작동하지 않습니다.

def tokens(sent): 
     return nltk.word_tokenize(sent) 

def SpellChecker(line): 
     for i in tokens(line): 
      strip = i.rstrip() 
      if not WN.synsets(strip): 
       print("Wrong spellings : " +i) 
      else: 
       print("No mistakes :" + i) 

def removePunct(str): 
     return "".join(c for c in str if c not in ('!','.',':',',')) 

l = "Attempting artiness With black & white and clever camera angles, the movie disappointed - became even more ridiculous - as the acting was poor and the plot and lines almost non-existent. " 
noPunct = removePunct(l.lower()) 
if(SpellChecker(noPunct)): 
     print(l) 
     print(noPunct) 

누군가 내게 이유를 줄 수 있습니까? 사람들은 워드 넷에 포함되지 않은 stopwords이기 때문에

답변

3

그래서, 대신 같은 단어를 확인하기 위해 NLTK 말뭉치에서 중지 단어 사용할 수 있습니다 (FAQs 확인) 잘못된 철자를주고있다.

#Add these lines: 
import nltk 
from nltk.corpus import wordnet as WN 
from nltk.corpus import stopwords 
stop_words_en = set(stopwords.words('english')) 

def tokens(sent): 
     return nltk.word_tokenize(sent) 

def SpellChecker(line): 
    for i in tokens(line): 
     strip = i.rstrip() 
     if not WN.synsets(strip): 
      if strip in stop_words_en: # <--- Check whether it's in stopword list 
       print("No mistakes :" + i) 
      else: 
       print("Wrong spellings : " +i) 
     else: 
      print("No mistakes :" + i) 


def removePunct(str): 
     return "".join(c for c in str if c not in ('!','.',':',',')) 

l = "Attempting artiness With black & white and clever camera angles, the movie disappointed - became even more ridiculous - as the acting was poor and the plot and lines almost non-existent. " 

noPunct = removePunct(l.lower()) 
if(SpellChecker(noPunct)): 
     print(l) 
     print(noPunct) 
관련 문제