특정 단어를 가져 와서 각 단어/단어의 빈도를 인쇄 하시겠습니까?

나는 밴드 목록과 제작 된 앨범 및 연도가있는 파일을 가지고 있습니다. 이 파일을 살펴보고 다른 이름의 밴드를 찾고 해당 밴드가이 파일에 몇 번 나왔는지 계산하는 함수를 작성해야합니다. 파일이 보이는특정 단어를 가져 와서 각 단어/단어의 빈도를 인쇄 하시겠습니까?

방법은 다음과 같이이다 : 여기

band1: number1 
band2: number2 
band3: number3

내가 지금까지 가지고있는 코드 :

Beatles - Revolver (1966) 
Nirvana - Nevermind (1991) 
Beatles - Sgt Pepper's Lonely Hearts Club Band (1967) 
U2 - The Joshua Tree (1987) 
Beatles - The Beatles (1968) 
Beatles - Abbey Road (1969) 
Guns N' Roses - Appetite For Destruction (1987) 
Radiohead - Ok Computer (1997) 
Led Zeppelin - Led Zeppelin 4 (1971) 
U2 - Achtung Baby (1991) 
Pink Floyd - Dark Side Of The Moon (1973) 
Michael Jackson -Thriller (1982) 
Rolling Stones - Exile On Main Street (1972) 
Clash - London Calling (1979) 
U2 - All That You Can't Leave Behind (2000) 
Weezer - Pinkerton (1996) 
Radiohead - The Bends (1995) 
Smashing Pumpkins - Mellon Collie And The Infinite Sadness (1995) 
. 
. 
.

출력 주파수의 내림차순으로하고 다음과 같이한다 :

def read_albums(filename) : 

    file = open("albums.txt", "r") 
    bands = {} 
    for line in file : 
     words = line.split() 
     for word in words: 
      if word in '-' : 
       del(words[words.index(word):]) 
     string1 = "" 
     for i in words : 
      list1 = [] 

      string1 = string1 + i + " " 
      list1.append(string1) 
     for k in list1 : 
      if (k in bands) : 
       bands[k] = bands[k] +1 
      else : 
       bands[k] = 1 


    for word in bands : 
     frequency = bands[word] 
     print(word + ":", len(bands))

나는 쉬운 방법이 있다고 생각하지만 확실하지 않습니다. 또한 빈도 순으로 사전을 정렬하는 방법을 잘 모르겠습니다. 목록으로 변환해야합니까?

출처

2013-08-07 Preston May

['collections.Counter'] (http://docs.python.org/2/library/collections.html#collections. 카운터) –

당신이 옳다는 Counter으로, 쉬운 방법이 있습니다 :

정확히 무엇을하고 있는지

from collections import Counter 

with open('bandfile.txt') as f: 
    counts = Counter(line.split('-')[0].strip() for line in f if line) 

for band, count in counts.most_common(): 
    print("{0}:{1}".format(band, count))

: line.split('-')[0].strip() for line in f if line?

이 줄은 긴 다음 루프의 형태이다 : 그러나 위의 루프는 달리

temp_list = [] 
for line in f: 
    if line: # this makes sure to skip blank lines 
     bits = line.split('-') 
     temp_list.add(bits[0].strip()) 

counts = Counter(temp_list)

- 그것은 중간 목록을 작성하지 않습니다. 대신, generator expression을 생성합니다. 이는보다 효율적으로 메모리를 효율적으로 사용할 수있는 방법입니다. Counter의 인수로 사용됩니다.

출처

2013-08-07 16:39:01

'카운터'는 2.7 이상에서만 사용할 수 있습니다. 이전 버전을 사용하는 경우 여기에서 허용되는 대답을 확인하십시오. http://stackoverflow.com/questions/613183/python-sort-a-dictionary-by-value –

저는 아직 Python을 처음 접했지만, 그렇다면 with 문은 무엇을합니까? 이 코드는 아니지만 일반적으로 –

http://docs.python.org/2/reference/compound_stmtshtml # –

당신이 간결을 찾는 경우는 "defaultdict"하고

from collections import defaultdict 
bands = defaultdict(int) 
with open('tmp.txt') as f: 
    for line in f.xreadlines(): 
     band = line.split(' - ')[0] 
     bands[band] += 1 
for band, count in sorted(bands.items(), key=lambda t: t[1], reverse=True): 
    print '%s: %d' % (band, count)

출처

2013-08-07 16:42:59 thierrybm

왜 정렬 되었습니까? 질문은 정렬 된 출력을 요구하지 않습니다. 'collections.Counter(). most_common()'은 좀 더 간결 할 것입니다. 왜냐하면 아이템을 빈번하게 역 정렬 된 순서로 반환하기 때문입니다. –

참; 내가 쓴 때 카운터 솔루션을 보지 못했지만, 더 낫다! – thierrybm

내 접근 방식은 구성 토큰의 목록에 파일 라인을 파괴하기 위해 split() 방법을 사용하는 것입니다 "분류"를 사용합니다. 그럼 당신은 밴드 이름 (목록의 첫 번째 토큰)을 잡아, 그리고 카운트를 추적 유지하기 위해 사전에 이름을 추가 할 수 있습니다 :

import operator 

def main(): 
    f = open("albums.txt", "rU") 
    band_counts = {} 

    #build a dictionary that adds each band as it is listed, then increments the count for re-lists 
    for line in f: 
    line_items = line.split("-") #break up the line into individual tokens 
    band = line_items[0] 

    #don't want to add newlines to the band list 
    if band == "\n": 
    continue 

    if band in band_counts: 
    band_counts[band] += 1 #band already in the counts, increment the counts 
    else: 
    band_counts[band] = 1 #if the band was not already in counts, add it with a count of 1 

    #create a list of sorted results 
    sorted_list = sorted(band_counts.iteritems(), key=operator.itemgetter(1)) 

    for item in sorted_list: 
    print item[0], ":", item[1]

참고 :

나는의 조언을 따라 이 대답은 정렬 된 결과를 만듭니다 : Sort a Python dictionary by value
파이썬을 처음 사용하는 경우 Google Python 클래스를 확인하십시오. 처음 시작했을 때 매우 도움이되었다는 것을 알았습니다. https://developers.google.com/edu/python/?csw=1

출처

2013-08-07 17:38:11 caffreyd

특정 단어를 가져 와서 각 단어/단어의 빈도를 인쇄 하시겠습니까?

답변

관련 문제