Stanford NER in NLTK markiert mehrere Sätze nicht richtig - Python

Question

Mar 09, 2016, 07:26 PM

nlp nltk named-entity-recognition stanford-nlp python

Stanford NER in NLTK markiert mehrere Sätze nicht richtig - Python

Ich habe eine Funktion, die die benannten Entitäten in einem bestimmten Textkörper mit der Stanford-NER zurückgibt.

def get_named_entities(text):
    load_ner_files()

    print text[:100] # to show that the text is fine
    text_split = text.split()        
    print text_split # to show the split is working fine
    result = "named entities = ", st.tag(text_split)
    return result

Ich lade den Text von einer URL mit dem Python-Zeitungspaket.

def get_page_text():
    url = "https://aeon.co/essays/elon-musk-puts-his-case-for-a-multi-planet-civilisation"
    page = Article(url)
    page.download()
    page.parse() 
    return unicodedata.normalize('NFKD', page.text).encode('ascii', 'ignore')

Beim Ausführen der Funktion erhalte ich jedoch die folgende Ausgabe:

['Fuck', 'Earth!', 'Elon', 'Musk', 'said', 'to', 'me,', 'laughing.', 'Who', 'cares', 'about', 'Earth?'......... (continued)
named entities = [('Fuck', 'O'), ('Earth', 'O'), ('!', 'O')]

So ist meine Frage, warum nur die ersten drei Wörter markiert werden?