Python: ¿Cómo actualizar el valor del par de valores clave en el diccionario anidado?

Question

Feb 22, 2011, 04:00 PM

Python: ¿Cómo actualizar el valor del par de valores clave en el diccionario anidado?

Estoy tratando de hacer un índice de documento inverso, por lo tanto, necesito saber de todas las palabras únicas de una colección en qué documento aparecen y con qué frecuencia.

he utilizadoest answer en el orden dos crea un diccionario anidado. La solución proporcionada funciona bien, aunque con un problema.

Primero abro el archivo y hago una lista de palabras únicas. Estas palabras únicas que quiero comparar con el archivo original. Cuando hay una coincidencia, el contador de frecuencia debe actualizarse y su valor debe almacenarse en la matriz bidimensional.

output finalmente debería verse así:

word1, {doc1 : freq}, {doc2 : freq} <br>
word2, {doc1 : freq}, {doc2 : freq}, {doc3:freq}
etc....

Problema es que no puedo actualizar la variable del diccionario. Al intentar hacerlo me sale el error:

  File "scriptV3.py", line 45, in main
    freq = dictionary[keyword][filename] + 1
TypeError: unsupported operand type(s) for +: 'AutoVivification' and 'int'

Creo que necesito transmitir de alguna manera la instancia de AutoVivification a int ....

¿Como ir

gracias por adelantad

mi código

#!/usr/bin/env python 
# encoding: utf-8

import sys
import os
import re
import glob
import string
import sets

class AutoVivification(dict):
    """Implementation of perl's autovivification feature."""
    def __getitem__(self, item):
        try:
            return dict.__getitem__(self, item)
        except KeyError:
            value = self[item] = type(self)()
            return value

def main():
    pad = 'temp/'
    dictionary  = AutoVivification()
    docID = 0
    for files in glob.glob( os.path.join(pad, '*.html') ):  #for all files in specified folder:
        docID = docID + 1
        filename = "doc_"+str(docID)
        text = open(files, 'r').read()                      #returns content of file as string
        text = extract(text, '<pre>', '</pre>')             #call extract function to extract text from within <pre> tags
        text = text.lower()                                 #all words to lowercase
        exclude = set(string.punctuation)                   #sets list of all punctuation characters
        text = ''.join(char for char in text if char not in exclude) # use created exclude list to remove characters from files
        text = text.split()                                 #creates list (array) from string
        uniques = set(text)                                 #make list unique (is dat handig? we moeten nog tellen)

        for keyword in uniques:                             #For every unique word do   
            for word in text:                               #for every word in doc:
                if (word == keyword and dictionary[keyword][filename] is not None): #if there is an occurence of keyword increment counter 
                    freq = dictionary[keyword][filename]    #here we fail, cannot cast object instance to integer.
                    freq = dictionary[keyword][filename] + 1
                    print(keyword,dictionary[keyword])
                else:
                    dictionary[word][filename] = 1

#extract text between substring 1 and 2 
def extract(text, sub1, sub2): 
    return text.split(sub1, 1)[-1].split(sub2, 1)[0]    

if __name__ == '__main__':
    main()