How to count the correct length of a string with emojis in javascript?

  1. str.length gives the count of UTF-16 units.

  2. Unicode-proof way to get string length in codepoints (in characters) is [...str].length as iterable protocol splits the string to codepoints.

  3. If we need the length in graphemes (grapheme clusters), we have these native ways:

    a. Unicode property escapes in RegExp. See for example: Unicode-aware version of \w or Matching emoji.

    b. Intl.Segmenter — coming soon, probably in ES2021. Can be tested with a flag in the last V8 versions (realization was synced with the last spec in V8 86). Unflagged (shipped) in V8 87.

See also:

  • The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)

  • What every JavaScript developer should know about Unicode

  • JavaScript has a Unicode problem

  • Unicode-aware regular expressions in ES2015

  • ES6 Strings (and Unicode, ❤) in Depth

  • JavaScript for impatient programmers. Unicode – a brief introduction

Leave a Comment

Hata!: SQLSTATE[HY000] [1045] Access denied for user 'divattrend_liink'@'localhost' (using password: YES)