Python 3: os.walk() file paths UnicodeEncodeError: ‘utf-8’ codec can’t encode: surrogates not allowed

On Linux, filenames are ‘just a bunch of bytes’, and are not necessarily encoded in a particular encoding. Python 3 tries to turn everything into Unicode strings. In doing so the developers came up with a scheme to translate byte strings to Unicode strings and back without loss, and without knowing the original encoding. They … Read more

Convert between string, u16string & u32string

mbstowcs() and wcstombs() don’t necessarily convert to UTF-16 or UTF-32, they convert to wchar_t and whatever the locale wchar_t encoding is. All Windows locales uses a two byte wchar_t and UTF-16 as the encoding, but the other major platforms use a 4-byte wchar_t with UTF-32 (or even a non-Unicode encoding for some locales). A platform … Read more

Java Unicode String length

Found a solution to your problem. Based on this SO answer I made a program that uses regex character classes to search for letters that may have optional modifiers. It splits your string into single (combined if necessary) characters and puts them into a list: import java.util.*; import java.lang.*; import java.util.regex.*; class Main { public … Read more

What is the range of Unicode Printable Characters?

See, http://en.wikipedia.org/wiki/Unicode_control_characters You might want to look especially at C0 and C1 control character http://en.wikipedia.org/wiki/C0_and_C1_control_codes The wiki says, the C0 control character is in the range U+0000—U+001F and U+007F (which is the same range as ASCII) and C1 control character is in the range U+0080—U+009F other than C-control character, Unicode also has hundreds of formatting … Read more

Why is the length of this string longer than the number of characters in it?

Everyone else is giving the surface answer, but there’s a deeper rationale too: the number of “characters” is a difficult-to-define question and can be surprisingly expensive to compute, whereas a length property should be fast. Why is it difficult to define? Well, there’s a few options and none are really more valid than another: The … Read more

Hata!: SQLSTATE[HY000] [1045] Access denied for user 'divattrend_liink'@'localhost' (using password: YES)