Custom indent width for BeautifulSoup .prettify()

I actually dealt with this myself, in the hackiest way possible: by post-processing the result.

r = re.compile(r'^(\s*)', re.MULTILINE)
def prettify_2space(s, encoding=None, formatter="minimal"):
    return r.sub(r'\1\1', s.prettify(encoding, formatter))

Actually, I monkeypatched prettify_2space in place of prettify in the class. That’s not essential to the solution, but let’s do it anyway, and make the indent width a parameter instead of hardcoding it to 2:

orig_prettify = bs4.BeautifulSoup.prettify
r = re.compile(r'^(\s*)', re.MULTILINE)
def prettify(self, encoding=None, formatter="minimal", indent_width=4):
    return r.sub(r'\1' * indent_width, orig_prettify(self, encoding, formatter))
bs4.BeautifulSoup.prettify = prettify

So:

x = '''<section><article><h1></h1><p></p></article></section>'''
soup = bs4.BeautifulSoup(x)
print(soup.prettify(indent_width=3))

… gives:

<html>
   <body>
      <section>
         <article>
            <h1>
            </h1>
            <p>
            </p>
         </article>
      </section>
   </body>
</html>

Obviously if you want to patch Tag.prettify as well as BeautifulSoup.prettify, you have to do the same thing there. (You might want to create a generic wrapper that you can apply to both, instead of repeating yourself.) And if there are any other prettify methods, same deal.

Leave a Comment

Hata!: SQLSTATE[HY000] [1045] Access denied for user 'divattrend_liink'@'localhost' (using password: YES)