Confusing sizeof(char) by ISO/IEC in different character set encoding like UTF-16

Question

The C++ standard (and C, for that matter) effectively define byte as the size of a char type, not as an eight-bit quantity¹. As per C++11 1.7/1 (my bold):

The fundamental storage unit in the C++ memory model is the byte. A byte is at least large enough to contain any member of the basic execution character set and the eight-bit code units of the Unicode UTF-8 encoding form and is composed of a contiguous sequence of bits, the number of which is implementation defined.

Hence the expression sizeof(char) is always 1, no matter what.

If you want to see whether you baseline char variable (probably the unsigned variant would be best) can actually hold a 16-bit value, the item you want to look at is CHAR_BIT from <climits>. This holds the number of bits in a char variable.

¹ Many standards, especially ones related to communications protocols, use the more exact term octet for an eight-bit value.

Leave a Comment Cancel reply