-
str.lengthgives the count of UTF-16 units. -
Unicode-proof way to get string length in codepoints (in characters) is
[...str].lengthas iterable protocol splits the string to codepoints. -
If we need the length in graphemes (grapheme clusters), we have these native ways:
a. Unicode property escapes in RegExp. See for example: Unicode-aware version of \w or Matching emoji.
b. Intl.Segmenter — coming soon, probably in ES2021. Can be tested with a flag in the last V8 versions (realization was synced with the last spec in V8 86). Unflagged (shipped) in V8 87.
See also:
-
The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)
-
What every JavaScript developer should know about Unicode
-
JavaScript has a Unicode problem
-
Unicode-aware regular expressions in ES2015
-
ES6 Strings (and Unicode, ❤) in Depth
-
JavaScript for impatient programmers. Unicode – a brief introduction