Why does Java char use UTF-16?

Java used UCS-2 before transitioning over UTF-16 in 2004/2005. The reason for the original choice of UCS-2 is mainly historical: Unicode was originally designed as a fixed-width 16-bit character encoding. The primitive data type char in the Java programming language was intended to take advantage of this design by providing a simple data type that … Read more

Python and BeautifulSoup encoding issues [duplicate]

In your case this page has wrong utf-8 data which confuses BeautifulSoup and makes it think that your page uses windows-1252, you can do this trick: soup = BeautifulSoup.BeautifulSoup(content.decode(‘utf-8′,’ignore’)) by doing this you will discard any wrong symbols from the page source and BeautifulSoup will guess the encoding correctly. You can replace ‘ignore’ by ‘replace’ … Read more