Why does Java char use UTF-16?

Java used UCS-2 before transitioning over UTF-16 in 2004/2005. The reason for the original choice of UCS-2 is mainly historical: Unicode was originally designed as a fixed-width 16-bit character encoding. The primitive data type char in the Java programming language was intended to take advantage of this design by providing a simple data type that … Read more

UTF8 vs. UTF16 vs. char* vs. what? Someone explain this mess to me!

Check out Joel Spolsky’s The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) EDIT 20140523: Also, watch Characters, Symbols and the Unicode Miracle by Tom Scott on YouTube – it’s just under ten minutes, and a wonderful explanation of the brilliant ‘hack’ that is UTF-8

JavaScript strings outside of the BMP

Depends what you mean by ‘support’. You can certainly put non-UCS-2 characters in a JS string using surrogates, and browsers will display them if they can. But, each item in a JS string is a separate UTF-16 code unit. There is no language-level support for handling full characters: all the standard String members (length, split, … Read more

Why does the Java char primitive take up 2 bytes of memory?

When Java was originally designed, it was anticipated that any Unicode character would fit in 2 bytes (16 bits), so char and Character were designed accordingly. In fact, a Unicode character can now require up to 4 bytes. Thus, UTF-16, the internal Java encoding, requires supplementary characters use 2 code units. Characters in the Basic … Read more

Hata!: SQLSTATE[HY000] [1045] Access denied for user 'divattrend_liink'@'localhost' (using password: YES)