Truncating unicode so it fits a maximum size when encoded for wire transfer

January 5, 2024 by Tarik

def unicode_truncate(s, length, encoding='utf-8'):
    encoded = s.encode(encoding)[:length]
    return encoded.decode(encoding, 'ignore')

Here is an example for a Unicode string where each character is represented with 2 bytes in UTF-8 and that would’ve crashed if the split Unicode code point wasn’t ignored:

>>> unicode_truncate(u'абвгд', 5)
u'\u0430\u0431'

Leave a Comment Cancel reply