UTF8 vs. UTF16 vs. char* vs. what? Someone explain this mess to me!

Check out Joel Spolsky’s The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) EDIT 20140523: Also, watch Characters, Symbols and the Unicode Miracle by Tom Scott on YouTube – it’s just under ten minutes, and a wonderful explanation of the brilliant ‘hack’ that is UTF-8

Multibyte trim in PHP?

The standard trim function trims a handful of space and space-like characters. These are defined as ASCII characters, which means certain specific bytes from 0 to 0100 0000. Proper UTF-8 input will never contain multi-byte characters that is made up of bytes 0xxx xxxx. All the bytes in proper UTF-8 multibyte characters start with 1xxx … Read more

What is a multibyte character set?

The term is ambiguous, but in my internationalization work, we typically avoided the term “multibyte character sets” to refer to Unicode-based encodings. Generally, we used the term only for legacy encoding schemes that had one or more bytes to define each character (excluding encodings that require only one byte per character). Shift-jis, jis, euc-jp, euc-kr, … Read more

Ruby 1.9: how can I properly upcase & downcase multibyte strings?

for anybody coming from Google by ruby upcase utf8: > “your problem chars here çöğıü Iñtërnâtiônàlizætiøn”.mb_chars.upcase.to_s => “YOUR PROBLEM CHARS HERE ÇÖĞIÜ IÑTËRNÂTIÔNÀLIZÆTIØN” solution is to use mb_chars. Documentation: https://www.rubydoc.info/gems/activesupport/String#mb_chars-instance_method https://api.rubyonrails.org/classes/ActiveSupport/Multibyte/Chars.html

How does UTF-8 “variable-width encoding” work?

Each byte starts with a few bits that tell you whether it’s a single byte code-point, a multi-byte code point, or a continuation of a multi-byte code point. Like this: 0xxx xxxx A single-byte US-ASCII code (from the first 127 characters) The multi-byte code-points each start with a few bits that essentially say “hey, you … Read more

Hata!: SQLSTATE[HY000] [1045] Access denied for user 'divattrend_liink'@'localhost' (using password: YES)