Why does anyone use an encoding other than UTF-8? [closed]

Wikipedia lists advantages and disadvantages of UTF-8 as compared to a variety of other encodings: http://en.wikipedia.org/wiki/UTF-8#Advantages_and_disadvantages The most important disadvantages are IMHO that UTF-8 might use significantly more space especially in Asian languages such as Chinese, Japanese or Hindi and that not all code points have the same size which makes measurements more difficult and … Read more

Unicode range for Japanese

As zawhtut mentioned, this page has a reference for several unicode ranges. To summarize the ranges: Japanese-style punctuation ( 3000 – 303f) Hiragana ( 3040 – 309f) Katakana ( 30a0 – 30ff) Full-width roman characters and half-width katakana ( ff00 – ffef) CJK unifed ideographs – Common and uncommon kanji ( 4e00 – 9faf)

Monospace Unicode font

I’m also searching for mono space, rich Unicode font. So far I use DejaVu Sans Mono, but I wanted to know whether there is better (for me) replacement. So, as of today, I have downloaded the following TTF fonts and count their number of glyphs: DejaVu Sans Mono: 3289 Everson Mono: 9671 Fixedsys Excelsior: 5993 … Read more

Color in the Unicode standard?

From the Unicode FAQ: Emoji and Dingbats, bolding mine: Q: What about characters whose name specifies a color? A: Some of the characters from the core emoji sets have names that include a color term, for example, BLUE HEART or ORANGE BOOK. These color terms in the names do not imply any requirement about how … Read more

What is unicode character 2028 (LS / Line Separator) used for?

Nicked from McDowell’s comment on the same page, and indirectly from the Unicode docs: Traditionally, NLF started out as a line separator (and sometimes record separator). It is still used as a line separator in simple text editors such as program editors. As platforms and programs started to handle word processing with automatic line-wrap, these … Read more

How to iterate over Unicode grapheme clusters in Rust?

You want to use the unicode-segmentation crate: use unicode_segmentation::UnicodeSegmentation; // 1.5.0 fn main() { for g in “नमस्ते्”.graphemes(true) { println!(“- {}”, g); } } (Playground, note: the playground editor can’t properly handle the string, so the cursor position is wrong in this one line) This prints: – न – म – स् – ते् The … Read more

Hata!: SQLSTATE[HY000] [1045] Access denied for user 'divattrend_liink'@'localhost' (using password: YES)