Разбор текста Python между двумя словами

Question

Nov 22, 2012, 03:01 AM

Разбор текста Python между двумя словами

Я использую Beautifulsoup и хочу извлечь весь текст из двух слов на веб-странице.

Например, представьте следующий текст на сайте:

This is the text of the webpage. It is just a string of a bunch of stuff and maybe some tags in between.

Я хочу вытащить все на странице, которая начинается сtext и заканчиваетсяbunch

В этом случае яхочу только:

text of the webpage. It is just a string of a bunch

Тем не менее, естьВозможно, на странице может быть несколько таких случаев.

Каков наилучший способ сделать это?

Это моя текущая настройка:

#!/usr/bin/env python
from mechanize import Browser
from BeautifulSoup import BeautifulSoup

mech = Browser()
urls = [
http://ca.news.yahoo.com/forget-phoning-business-app-sends-text-instead-100143774--sector.html
    ]



   for url in urls:
        page = mech.open(url)
        html = page.read()
        soup = BeautifulSoup(html)
        text= soup.prettify()
            texts = soup.findAll(text=True) 

    def visible(element):
        if element.parent.name in ['style', 'script', '[document]', 'head', 'title']: 
        # If the parent of your element is any of those ignore it

            return False

        elif re.match('', str(element)):
        # If the element matches an html tag, ignore it

            return False

        else:
        # Otherwise, return True as these are the elements we need

          return True

    visible_texts = filter(visible, texts)
    # Filter only returns those items in the sequence, texts, that return True. 
    # We use those to build our final list.

    for line in visible_texts:
      print line

Разбор текста Python между двумя словами

Ответы на вопрос(1)

Ваш ответ на вопрос

Популярные вопросы

Вы очень активны! Это здорово!

Разбор текста Python между двумя словами

Ответы на вопрос(1)

Ваш ответ на вопрос

Популярные вопросы