LookupError: recurso 'corpora / stopwords' não encontrado

Question

Jun 08, 2014, 05:02 PM

LookupError: recurso 'corpora / stopwords' não encontrado

Estou tentando executar um webapp no Heroku usando o Flask. O webapp é programado em Python com a NLTK (biblioteca Natural Language Toolkit).

Um dos arquivos tem o seguinte cabeçalho:

import nltk, json, operator
from nltk.corpus import stopwords 
from nltk.tokenize import RegexpTokenizer

Quando a página da Web com o código de palavras irrelevantes é chamada, produz o seguinte erro:

LookupError: 
**********************************************************************
  Resource 'corpora/stopwords' not found.  Please use the NLTK  
  Downloader to obtain the resource:  >>> nltk.download()  
  Searched in:  
    - '/app/nltk_data'  
    - '/usr/share/nltk_data'  
    - '/usr/local/share/nltk_data'  
    - '/usr/lib/nltk_data'  
    - '/usr/local/lib/nltk_data'  
**********************************************************************

O código exato usado:

#remove punctuation  
toker = RegexpTokenizer(r'((?<=[^\w\s])\w(?=[^\w\s])|(\W))+', gaps=True) 
data = toker.tokenize(data)  

#remove stop words and digits 
stopword = stopwords.words('english')  
data = [w for w in data if w not in stopword and not w.isdigit()]

O aplicativo da Web no Heroku não produz o erro de pesquisa quandostopword = stopwords.words('english') é comentado.

O código é executado sem falhas no meu computador local. Eu instalei as bibliotecas necessárias no meu computador usando