How to remove \xa0 from string in Python?

\xa0 is actually non-breaking space in Latin1 (ISO 8859-1), also chr(160). You should replace it with a space. string = string.replace(u’\xa0′, u’ ‘) When .encode(‘utf-8’), it will encode the unicode to utf-8, that means every unicode could be represented by 1 to 4 bytes. For this case, \xa0 is represented by 2 bytes \xc2\xa0. Read … Read more

Url decode UTF-8 in Python

The data is UTF-8 encoded bytes escaped with URL quoting, so you want to decode, with urllib.parse.unquote(), which handles decoding from percent-encoded data to UTF-8 bytes and then to text, transparently: from urllib.parse import unquote url = unquote(url) Demo: >>> from urllib.parse import unquote >>> url=”example.com?title=%D0%BF%D1%80%D0%B0%D0%B2%D0%BE%D0%B2%D0%B0%D1%8F+%D0%B7%D0%B0%D1%89%D0%B8%D1%82%D0%B0″ >>> unquote(url) ‘example.com?title=правовая+защита’ The Python 2 equivalent is urllib.unquote(), … Read more

How to get UTF-8 working in Java webapps?

Answering myself as the FAQ of this site encourages it. This works for me: Mostly characters äåö are not a problematic as the default character set used by browsers and tomcat/java for webapps is latin1 ie. ISO-8859-1 which “understands” those characters. To get UTF-8 working under Java+Tomcat+Linux/Windows+Mysql requires the following: Configuring Tomcat’s server.xml It’s necessary … Read more

Unicode (UTF-8) reading and writing to files in Python

Rather than mess with .encode and .decode, specify the encoding when opening the file. The io module, added in Python 2.6, provides an io.open function, which allows specifying the file’s encoding. Supposing the file is encoded in UTF-8, we can use: >>> import io >>> f = io.open(“test”, mode=”r”, encoding=”utf-8″) Then f.read returns a decoded … Read more

Setting the default Java character encoding

Unfortunately, the file.encoding property has to be specified as the JVM starts up; by the time your main method is entered, the character encoding used by String.getBytes() and the default constructors of InputStreamReader and OutputStreamWriter has been permanently cached. As Edward Grech points out, in a special case like this, the environment variable JAVA_TOOL_OPTIONS can … Read more

What is the difference between utf8mb4 and utf8 charsets in MySQL?

UTF-8 is a variable-length encoding. In the case of UTF-8, this means that storing one code point requires one to four bytes. However, MySQL’s encoding called “utf8” (alias of “utf8mb3”) only stores a maximum of three bytes per code point. So the character set “utf8″/”utf8mb3” cannot store all Unicode code points: it only supports the … Read more

Hata!: SQLSTATE[HY000] [1045] Access denied for user 'divattrend_liink'@'localhost' (using password: YES)