Best output type and encoding practices for repr() functions?

Question

In Python2, __repr__ (and __str__) must return a string object, not a
unicode object. In Python3, the situation is reversed, __repr__ and __str__
must return unicode objects, not byte (née string) objects:

class Foo(object):
    def __repr__(self):
        return u'\N{WHITE SMILING FACE}' 

class Bar(object):
    def __repr__(self):
        return u'\N{WHITE SMILING FACE}'.encode('utf8')

repr(Bar())
# ☺
repr(Foo())
# UnicodeEncodeError: 'ascii' codec can't encode character u'\u263a' in position 0: ordinal not in range(128)

In Python2, you don’t really have a choice. You have to pick an encoding for the
return value of __repr__.

By the way, have you read the PrintFails wiki? It may not directly answer
your other questions, but I did find it helpful in illuminating why certain
errors occur.

When using from __future__ import unicode_literals,

'<{}>'.format(repr(x).decode('utf-8'))).encode('utf-8')

can be more simply written as

str('<{}>').format(repr(x))

assuming str encodes to utf-8 on your system.

Without from __future__ import unicode_literals, the expression can be written as:

'<{}>'.format(repr(x))

Leave a Comment Cancel reply