What does sorting mean in non-alphabetic (i.e, Asian) languages?

Does one double-byte character really get compared against the other in a sort function?

The native String type in JavaScript is based on UTF-16 code units, and that’s what gets compared. For characters in the Basic Multilingual Plane (which all these are), this is the same as Unicode code points.

The term ‘double-byte’ as in encodings like Shift-JIS has no meaning in a web context: DOM and JavaScript strings are natively Unicode, the original bytes in the encoded page received by the browser are long gone.

Does the result of such a sort mean anything at all?

Little. Unicode code points do not claim to offer any particular ordering… for one, because there is no globally-accepted ordering. Even for the most basic case of ASCII Latin characters, languages disagree (eg. on whether v and w are the same letter, or whether the uppercase of i is I or İ). And CJK gets much gnarlier than that.

The main Unicode CJK Unified Ideographs block happens to be ordered by radical and number of strokes (Kangxi dictionary order), which may be vaguely useful. But use characters from any of the other CJK extension blocks, or mix in some kana, or romaji, and there will be no meaningful ordering between them.

The Unicode Consortium do attempt to define some general ordering rules, but it’s complex and not generally attempted at a language level. Systems that really need language-sensitive sorting abilities (eg. OSes, databases) tend to have their own collation schemes.

This is different from the ordering of the Japanese syllabary

Yes. Above and beyond collation issues in general, it’s a massively difficult task to handle kanji accurately by syllable, because you have to guess at the pronunciation. JavaScript can’t realistically know that by ‘藤本’ you mean ‘Fujimoto’ and not ‘touhon’; this sort of thing requires in-depth built-in dictionaries and still-unreliable heuristics… not the sort of thing you want to build in to a programming language.

Leave a Comment

Hata!: SQLSTATE[HY000] [1045] Access denied for user 'divattrend_liink'@'localhost' (using password: YES)