How to detect with python if the string contains html code?

You can use an HTML parser, like BeautifulSoup. Note that it really tries it best to parse an HTML, even broken HTML, it can be very and not very lenient depending on the underlying parser:

>>> from bs4 import BeautifulSoup
>>> html = """<html>
... <head><title>I'm title</title></head>
... </html>"""
>>> non_html = "This is not an html"
>>> bool(BeautifulSoup(html, "html.parser").find())
True
>>> bool(BeautifulSoup(non_html, "html.parser").find())
False

This basically tries to find any html element inside the string. If found – the result is True.

Another example with an HTML fragment:

>>> html = "Hello, <b>world</b>"
>>> bool(BeautifulSoup(html, "html.parser").find())
True

Alternatively, you can use lxml.html:

>>> import lxml.html
>>> html="Hello, <b>world</b>"
>>> non_html = "<ht fldf d><"
>>> lxml.html.fromstring(html).find('.//*') is not None
True
>>> lxml.html.fromstring(non_html).find('.//*') is not None
False

Leave a Comment

Hata!: SQLSTATE[HY000] [1045] Access denied for user 'divattrend_liink'@'localhost' (using password: YES)