Programa Python que encontra a palavra mais frequente em um arquivo .txt, deve imprimir a palavra e sua contagem

Question

Apr 30, 2012, 11:42 PM

Programa Python que encontra a palavra mais frequente em um arquivo .txt, deve imprimir a palavra e sua contagem

A partir de agora, tenho uma função para substituir a função countChars,

def countWords(lines):
  wordDict = {}
  for line in lines:
    wordList = lines.split()
    for word in wordList:
      if word in wordDict: wordDict[word] += 1
      else: wordDict[word] = 1
  return wordDict

mas quando eu executo o programa, ele cuspiu essa abominação (isso é apenas um exemplo, há cerca de duas páginas de palavras com um número enorme de contagens ao lado)

before 1478
battle-field 1478
as 1478
any 1478
altogether 1478
all 1478
ago 1478
advanced. 1478
add 1478
above 1478

Embora, obviamente, isso signifique que o código é sólido o suficiente para ser executado, não estou obtendo o que quero dele. Ele precisa imprimir quantas vezes cada palavra está no arquivo (gb.txt, que é o endereço de Gettysburg) Obviamente, cada palavra que está no arquivo não está lá exatamente 1478 vezes ..

Eu sou muito novo em programação, então estou um pouco perplexo ..

from __future__ import division

inputFileName = 'gb.txt'

def readfile(fname):
  f = open(fname, 'r')
  s = f.read()
  f.close()
 return s.lower()

def countChars(t):
  charDict = {}
  for char in t:
    if char in charDict: charDict[char] += 1
    else: charDict[char] = 1
  return charDict

def findMostCommon(charDict):
  mostFreq = ''
  mostFreqCount = 0
  for k in charDict:
    if charDict[k] > mostFreqCount:
      mostFreqCount = charDict[k]
      mostFreq = k
  return mostFreq

def printCounts(charDict):
  for k in charDict:
    #First, handle some chars that don't show up very well when they print
    if k == '\n': print '\\n', charDict[k]  #newline
    elif k == ' ': print 'space', charDict[k]
    elif k == '\t': print '\\t', charDict[k] #tab
    else: print k, charDict[k]  #Normal character - print it with its count

def printAlphabetically(charDict):
  keyList = charDict.keys()
  keyList.sort()
  for k in keyList:
    #First, handle some chars that don't show up very well when they print
    if k == '\n': print '\\n', charDict[k]  #newline
    elif k == ' ': print 'space', charDict[k]
    elif k == '\t': print '\\t', charDict[k] #tab
    else: print k, charDict[k]  #Normal character - print it with its count

def printByFreq(charDict):
  aList = []
  for k in charDict:
    aList.append([charDict[k], k])
  aList.sort()     #Sort into ascending order
  aList.reverse()  #Put in descending order
  for item in aList:
    #First, handle some chars that don't show up very well when they print
    if item[1] == '\n': print '\\n', item[0]  #newline
    elif item[1] == ' ': print 'space', item[0]
    elif item[1] == '\t': print '\\t', item[0] #tab
    else: print item[1], item[0]  #Normal character - print it with its count

def main():
  text = readfile(inputFileName)
  charCounts = countChars(text)
  mostCommon = findMostCommon(charCounts)
  #print mostCommon + ':', charCounts[mostCommon]
  #printCounts(charCounts)
  #printAlphabetically(charCounts)
  printByFreq(charCounts)

main()

leaveComments

questionAnswers(5)

yourAnswerToTheQuestion

Perguntas populares

0 a resposta

Atualização de campo do conjunto de dados SSRS não atualizando para Tablix

0 a resposta

Rolagem de tabela HTML vertical e horizontal

0 a resposta

MongoDB não autorizado para consulta admin.system.users

0 a resposta

Continue recebendo este erro de compilação [fechado]

0 a resposta

Como o minmax (0, 1fr) funciona para elementos longos, enquanto 1fr não funcion