BeautifulSoup - Suche nach Text innerhalb eines Tags

Question

Aug 12, 2015, 09:26 AM

BeautifulSoup - Suche nach Text innerhalb eines Tags

eobachten Sie das folgende Problem:

import re
from bs4 import BeautifulSoup as BS

soup = BS("""
<a href="/customer-menu/1/accounts/1/update">
    Edit
</a>
""")

# This returns the <a> element
soup.find(
    'a',
    href="/customer-menu/1/accounts/1/update",
    text=re.compile(".*Edit.*")
)

soup = BS("""
<a href="/customer-menu/1/accounts/1/update">
    <i class="fa fa-edit"></i> Edit
</a>
""")

# This returns None
soup.find(
    'a',
    href="/customer-menu/1/accounts/1/update",
    text=re.compile(".*Edit.*")
)

Aus irgendeinem Grund stimmt BeautifulSoup nicht mit dem Text überein, wenn das<i> tag ist auch da. Das Finden des Tags und das Anzeigen seines Texts ergibt

>>> a2 = soup.find(
        'a',
        href="/customer-menu/1/accounts/1/update"
    )
>>> print(repr(a2.text))
'\n Edit\n'

Richtig. Laut dem Docs, soup verwendet die Übereinstimmungsfunktion des regulären Ausdrucks, nicht die Suchfunktion. Also muss ich das DOTALL-Flag bereitstellen:

pattern = re.compile('.*Edit.*')
pattern.match('\n Edit\n')  # Returns None

pattern = re.compile('.*Edit.*', flags=re.DOTALL)
pattern.match('\n Edit\n')  # Returns MatchObject

In Ordung. Sieht gut aus. Lass es uns mit Suppe versuchen

soup = BS("""
<a href="/customer-menu/1/accounts/1/update">
    <i class="fa fa-edit"></i> Edit
</a>
""")

soup.find(
    'a',
    href="/customer-menu/1/accounts/1/update",
    text=re.compile(".*Edit.*", flags=re.DOTALL)
)  # Still return None... Why?!

Bearbeite

Meine Lösung basierend auf Geckons Antwort: Ich habe diese Helfer implementiert:

import re

MATCH_ALL = r'.*'


def like(string):
    """
    Return a compiled regular expression that matches the given
    string with any prefix and postfix, e.g. if string = "hello",
    the returned regex matches r".*hello.*"
    """
    string_ = string
    if not isinstance(string_, str):
        string_ = str(string_)
    regex = MATCH_ALL + re.escape(string_) + MATCH_ALL
    return re.compile(regex, flags=re.DOTALL)


def find_by_text(soup, text, tag, **kwargs):
    """
    Find the tag in soup that matches all provided kwargs, and contains the
    text.

    If no match is found, return None.
    If more than one match is found, raise ValueError.
    """
    elements = soup.find_all(tag, **kwargs)
    matches = []
    for element in elements:
        if element.find(text=like(text)):
            matches.append(element)
    if len(matches) > 1:
        raise ValueError("Too many matches:\n" + "\n".join(matches))
    elif len(matches) == 0:
        return None
    else:
        return matches[0]

Nun, wenn ich das Element oben finden möchte, starte ich einfachfind_by_text(soup, 'Edit', 'a', href='/customer-menu/1/accounts/1/update')

Antworten auf die Frage(6)

Top Fragen

0 die antwort

Funktioniert das XLConnect-Paket mit Java 8?

0 die antwort

Gibt es einen Unterschied zwischen der Verwendung nur des Standorts und der Verwendung von window.location in verschiedenen Browsern?

0 die antwort

PDF in JPG / Bilder konvertieren, ohne eine bestimmte C # -Bibliothek zu verwenden [closed]

0 die antwort

Verknüpfen einer freigegebenen Bibliothek mit einer anderen freigegebenen Bibliothek unter Linux

0 die antwort

Durchlaufen des AntiForgeryToken in MVC

Du bist sehr aktiv! Es ist großartig!

BeautifulSoup - Suche nach Text innerhalb eines Tags

Antworten auf die Frage(6)

Ihre Antwort auf die Frage

Top Fragen