NLTK의 색인을 부르는 것 - 사용 된 단어의 앞뒤에 텍스트를 가져 오는 방법?

나는 concordace가 반환 한 인스턴스 뒤의 텍스트를 찾고 싶습니다. 예를 들어, 'Searching Text' section에있는 예제를 보면 '괴물'이라는 단어의 일치어가 표시됩니다. 괴물 같은 사례가 나온 직후에 단어를 어떻게 얻을 수 있습니까?NLTK의 색인을 부르는 것 - 사용 된 단어의 앞뒤에 텍스트를 가져 오는 방법?

출처

2012-01-17 dev.e.loper

import nltk 
import nltk.book as book 
text1 = book.text1 
c = nltk.ConcordanceIndex(text1.tokens, key = lambda s: s.lower()) 
print([text1.tokens[offset+1] for offset in c.offsets('monstrous')])

는

['size', 'bulk', 'clubs', 'cannibal', 'and', 'fable', 'Pictures', 'pictures', 'stories', 'cabinet', 'size']

내가 concordance 방법을 정의하는 방법을 찾는하여이 발견 얻을 수 있습니다.

이 text1.concordance가 /usr/lib/python2.7/dist-packages/nltk/text.py에 정의되어 보여줍니다 해당 파일에서

In [107]: text1.concordance? 
Type:  instancemethod 
Base Class: <type 'instancemethod'> 
String Form: <bound method Text.concordance of <Text: Moby Dick by Herman Melville 1851>> 
Namespace: Interactive 
File:  /usr/lib/python2.7/dist-packages/nltk/text.py

당신이 ConcordanceIndex 객체를 생성하는 방법을 보여줍니다

def concordance(self, word, width=79, lines=25): 
    ... 
     self._concordance_index = ConcordanceIndex(self.tokens, 
                key=lambda s:s.lower()) 
    ...    
    self._concordance_index.print_concordance(word, width, lines)

를 찾을 수 있습니다.

그리고 같은 파일에 당신은 또한 찾을 수 있습니다 다음 IPython 인터프리터에서 어떤 실험으로

class ConcordanceIndex(object): 
    def __init__(self, tokens, key=lambda x:x): 
     ... 
    def print_concordance(self, word, width=75, lines=25): 
     ... 
     offsets = self.offsets(word) 
     ... 
     right = ' '.join(self._tokens[i+1:i+context])

,이 self.offsets('monstrous') 단어 monstrous 찾을 수있는 번호 (오프셋)의 목록을 제공 보여줍니다. 실제 단어는 self._tokens[offset]이며 text1.tokens[offset]과 동일합니다.

따라서 monstrous 다음 단어는 text1.tokens[offset+1]입니다.

출처

2012-01-17 17:11:18 unutbu

NLTK의 색인을 부르는 것 - 사용 된 단어의 앞뒤에 텍스트를 가져 오는 방법?

답변

관련 문제