PyTidyLib is a nice python binding for HTML Tidy. Their example:
from tidylib import tidy_document
document, errors = tidy_document('''<p>fõo <img src="https://stackoverflow.com/questions/35538/bar.jpg">''',
options={'numeric-entities':1})
print document
print errors
Moreover it’s compatible with both legacy HTML Tidy and the new tidy-html5.