Converting html to text with Python

soup.get_text() outputs what you want:

from bs4 import BeautifulSoup
soup = BeautifulSoup(html)
print(soup.get_text())

output:

Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Aenean commodo ligula eget dolor. Aenean massa
Consectetuer adipiscing elit. Some Link Aenean commodo ligula eget dolor. Aenean massa
Aenean massa.Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Aenean commodo ligula eget dolor. Aenean massa
Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Aenean commodo ligula eget dolor. Aenean massa
Consectetuer adipiscing elit. Aenean commodo ligula eget dolor. Aenean massa

To keep newlines:

print(soup.get_text('\n'))

To be identical to your example, you can replace a newline with two newlines:

soup.get_text().replace('\n','\n\n')

Leave a Comment

Hata!: SQLSTATE[HY000] [1045] Access denied for user 'divattrend_liink'@'localhost' (using password: YES)