A good way to get the charset/encoding of an HTTP response in Python
To parse http header you could use cgi.parse_header(): _, params = cgi.parse_header(‘text/html; charset=utf-8’) print params[‘charset’] # -> utf-8 Or using the response object: response = urllib2.urlopen(‘http://example.com’) response_encoding = response.headers.getparam(‘charset’) # or in Python 3: response.headers.get_content_charset(default) In general the server may lie about the encoding or do not report it at all (the default depends on … Read more