Rather than mess with .encode
and .decode
, specify the encoding when opening the file. The io
module, added in Python 2.6, provides an io.open
function, which allows specifying the file’s encoding
.
Supposing the file is encoded in UTF-8, we can use:
>>> import io
>>> f = io.open("test", mode="r", encoding="utf-8")
Then f.read
returns a decoded Unicode object:
>>> f.read()
u'Capit\xe1l\n\n'
In 3.x, the io.open
function is an alias for the built-in open
function, which supports the encoding
argument (it does not in 2.x).
We can also use open
from the codecs
standard library module:
>>> import codecs
>>> f = codecs.open("test", "r", "utf-8")
>>> f.read()
u'Capit\xe1l\n\n'
Note, however, that this can cause problems when mixing read()
and readline()
.