How to protect against diacritics such as Zalgo text

is there even a limit?! Not intrinsically in Unicode. There is the concept of a ‘Stream-Safe’ format in UAX-15 that sets a limit of 30 combiners… Unicode strings in general are not guaranteed to be Stream-Safe, but this could certainly be taken as a sign that Unicode don’t intend to standardise new characters that would … Read more

What’s up with these Unicode combining characters and how can we filter them?

What’s up with these unicode characters? That’s a character with a series of combining characters. Because the combining characters in question want to go above the base character, they stack up (literally). For instance, the case of ก้้้้้้้้้้้้้้้้้้้้ …it’s an ก (Thai character ko kai) (U+0E01) followed by 20 copies of the Thai combining character … Read more

How does Zalgo text work?

The text uses combining characters, also known as combining marks. See section 2.11 of Combining Characters in the Unicode Standard (PDF). In Unicode, character rendering does not use a simple character cell model where each glyph fits into a box with given height. Combining marks may be rendered above, below, or inside a base character … Read more

tech