파이어 폭스의 그래픽 히스토그램으로 단어 빈도 변환

이것이 바로 파벨 아노 소프 덕분입니다. 별표로 출력 된 단어 빈도를 변환하려고합니다.파이어 폭스의 그래픽 히스토그램으로 단어 빈도 변환

import sys 
import operator 
from collections import Counter 
def candidateWord(): 


    with open("sample.txt", 'r') as f: 
     text = f.read() 
    words = [w.strip('!,[email protected]#$%^&*()_+')for w in text.lower().split()] 
      #word_count[words] = word_count.get(words,0) + 1 
    counter = Counter(words) 

    print("\n".join("{} {}".format(*p) for p in counter.most_common())) 

candidateWord()

이것은 출력물로 지금 가지고 있습니다.

how 3 

i 2 

am 2 

are 2 

you 2 

good 1 

hbjkdfd 1

나는 시도하고 사용이 가장 빈번한 단어하고자하는 공식은 M 시간을 발생하고 현재 단어는 N 시간, 인쇄 별표의 수는 발생

는

(50 * N)/M

출처

2013-03-31 Tosh

나는 놓을 게요을 왼쪽에 별표 단어 정합 방지하려면

... 
counter = Counter(words) 
max_freq = counter.most_common()[0][1] 
for word, freq in sorted(counter.most_common(), key=lambda p: (-p[1], p[0])): 
    number_of_asterisks = (50 * freq) // max_freq  # (50 * N)/M 
    asterisks = '*' * number_of_asterisks  # the (50*N)/M asterisks 
    print('{:>50} {}'.format(asterisks, word))

:>50 형식 문자열은 「50 자 공백 좌 패드 "를 의미한다. 우리 counter.most_common() 위에 루핑되는 첫 번째 쌍의 제 요소, 그래서 최대 주파수

는 제 1 주파수를 내림차순으로 정렬하는 경우

counter.most_common
주파수
counter.most_common()[0][1]으로 정렬 (워드, 주파수) 쌍의리스트를 반환 다음 단어
number_of_asterisks은 수식에 의해 계산됩니다. 정수 결과를 얻으려면 정수 구분 //을 사용합니다.
우리는 별표 (*)를 number_of_asterisks 번 반복하고 우리는 asterisks 및 word를 인쇄 asterisks
에 결과를 저장합니다. 별표는 50 자 길이의 열에서 오른쪽 정렬됩니다.

출처

2013-03-31 22:45:39

하면 for 루프에서 무슨 일이 일어나고 있는지 설명 할 수 있습니까? (나는 파이썬에서 아주 새로운 것입니다.) – Tosh

또한 오류가 발생합니다 : 'float'유형의 non-int로 시퀀스를 곱할 수 없습니다 – Tosh

열을 전환 할 수 있습니까? 의미는 왼쪽에 단어가 있고 오른쪽에 별표가 있습니까? 만약 내가 추측 할 수 있다면, 다음과 같이 바꾼다 : print ('{:> 50} {}'. format (key, asterisks) – Tosh

코드 :

import sys 
import operator 
from collections import Counter 
def candidateWord(): 
    with open("sample.txt", 'r') as f: 
     text = f.read() 
    words = [w.strip('!,[email protected]#$%^&*()_+')for w in text.lower().split()] 
      #word_count[words] = word_count.get(words,0) + 1 
    counter = Counter(words) 

    # I added the code below... 
    columns = 80 
    n_occurrences = 10 
    to_plot = counter.most_common(n_occurrences) 
    labels, values = zip(*to_plot) 
    label_width = max(map(len, labels)) 
    data_width = columns - label_width - 1 
    plot_format = '{:%d}|{:%d}' % (label_width, data_width) 
    max_value = float(max(values)) 
    for i in range(len(labels)): 
    v = int(values[i]/max_value*data_width) 
    print(plot_format.format(labels[i], '*'*v)) 

candidateWord()

출력 :

the |*************************************************************************** 
and |**********************************************        
of |******************************************         
to |***************************             
a |************************             
in |********************              
that|******************               
i |****************               
was |*************                
it |**********

출처

2013-03-31 22:46:02 Robottinosino

차트의 열은 2, 단어 및 별표 여야합니다. – Tosh

2 개의 열이있는 코드 예제를 제공했습니다. 여전히이 "눈금"을 정의하는 것이 핵심입니다 (예 : 통계적 특이점의 경우) – Robottinosino

재미있는 사실 : 형식 문자열에서 필드를 중첩 할 수 있습니다. 이것을보십시오 :' "{:> {width}}". format ('test', width = 15)' –

파이어 폭스의 그래픽 히스토그램으로 단어 빈도 변환

답변

관련 문제