파일에서 이중 단어를 찾는 방법

코드에 문제가 있습니다. "the"와 같은 파일에서 반복되는 단어를 찾아서 그 행이 인쇄되도록 노력하고 있습니다. 지금까지 내 코드는 행 개수에 대해 작동하지만, 나에게 모든 파일을 반복하여 모든 단어를 제공하고, 바로 다음에 오는 단어를 제공하지는 않습니다. 변경해야하는 항목은 두 단어 만 집계하기 위해서입니까?파일에서 이중 단어를 찾는 방법

my_file = input("Enter file name: ") 
lst = [] 
count = 1 
with open(my_file, "r") as dup: 
for line in dup: 
    linedata = line.split() 
    for word in linedata: 
     if word not in lst: 
      lst.append(word) 
     else: 
      print("Found word: {""} on line {}".format(word, count)) 
      count = count + 1 
dup.close()

출처

2017-04-03 RandomUser

그냥'LST를 다시 = []'각 라인의 반복에서. –

@ Jean-FrançoisFabre는 인접한 단어뿐만 아니라 줄에 중복 된 단어를 감지합니다. – Maciek

여기 묻는 질문에 오직 순수한 답 :

"내가 그렇게 만 배 단어를 계산 변경해야합니까?" 여기

당신은 다음과 같습니다

my_file = input("Enter file name: ") 
count = 0 
with open(my_file, "r") as dup: 
for line in dup: 
    count = count + 1 
    linedata = line.split() 
    lastWord = '' 
    for word in linedata: 
     if word == lastWord: 
      print("Found word: {""} on line {}".format(word, count)) 
     lastWord = word 
dup.close()

출처

2017-04-03 15:50:02 Claudio

my_file = input("Enter file name: ") 
with open(my_file, "r") as dup: 
    for line_num, line in enumerate(dup): 
     words_in_line = line.split() 
     duplicates = [word for i, word in enumerate(words_in_line[1:]) if words_in_line[i] == word] 
     # now you have a list of duplicated words in line in duplicates 
     # do whatever you want with it

출처

2017-04-03 13:45:46 Maciek

열거 형은 이미 0부터 시작하므로'words_in_line [i]'이어야합니다.) – swenzel

@swenzel 맞아, 고마워! 지금 고쳤습니다. – Maciek

THISfile.py라는 이름의 파일에 아래의 코드를 넣고 그것이 무엇인지보고 실행을 수행합니다

는

# myFile = input("Enter file name: ") 
# line No 2: line with with double 'with' 
# line No 3: double (word , word) is not a double word 
myFile="THISfile.py" 
lstUniqueWords = [] 
noOfFoundWordDoubles = 0 
totalNoOfWords  = 0 
lineNo    = 0 
lstLineNumbersWithWordDoubles = [] 
with open(myFile, "r") as myFile: 
    for line in myFile: 
     lineNo+=1 # memorize current line number 
     lineWords = line.split() 
     if len(lineWords) > 0: # scan line only if it contains words 
      currWord = lineWords[0] # remember already 'visited' word 
      totalNoOfWords += 1 
      if currWord not in lstUniqueWords: 
       lstUniqueWords.append(currWord) 
       # put 'visited' word word into lstAllWordsINmyFile (if it is not already there) 
      lastWord = currWord # we are done with current, so current becomes last one 
      if len(lineWords) > 1 : # proceed only if line has two or more words 
       for word in lineWords[1:] : # loop over all other words 
        totalNoOfWords += 1 
        currWord = word 
        if currWord not in lstUniqueWords: 
         lstUniqueWords.append(currWord) 
         # put 'visited' word into lstAllWordsINmyFile (if it is not already there) 
        if(currWord == lastWord): # duplicate word found: 
         noOfFoundWordDoubles += 1 
         print("Found double word: ['{""}'] in line {}".format(currWord, lineNo)) 
         lstLineNumbersWithWordDoubles.append(lineNo) 
        lastWord = currWord 
        #  ^--- now after all all work is done, the currWord is considered lastWord 
print(
    "noOfDoubles", noOfFoundWordDoubles, "\n", 
    "totalNoOfWords", totalNoOfWords, "uniqueWords", len(lstUniqueWords), "\n", 
    "linesWithDoubles", lstLineNumbersWithWordDoubles 
)

는 출력은 다음과 같아야합니다

Found double word: ['with'] in line 2 
Found double word: ['word'] in line 19 
Found double word: ['all'] in line 33 
noOfDoubles 3 
totalNoOfWords 221 uniqueWords 111 
linesWithDoubles [2, 19, 33]

지금 코드의 주석을 확인하여 작동 방식을 더 잘 이해할 수 있습니다. 재미 코딩 적이 :

출처

2017-04-03 15:36:50 Claudio

파일에서 이중 단어를 찾는 방법

답변

관련 문제