The new Character Type of UTF-8 Strings: char8_t

Get introduced to a new character type, 'char8_t'.

In addition to the character types char16_t and char32_t from C++11, C++20 gets the new character type char8_t. Type char8_t is large enough to represent any UTF-8 code unit (8 bits). It has the same size, signednessA property of data types representing numbers in computer programs., and alignment as an unsigned char but is a distinct type.

🔑 char versus char8_t

A char has one byte. In contrast to a char8_t, the number of bits of a byte and hence of a char is not defined. Nearly all implementations use 88 bits for a byte. The std::string is an alias for a std::basic_string of chars.

std::string std::basic_string<char>
"Hello World"

Consequently, C++20 has a new typedef for the character type char8_t (line 1) and a new UTF-8 string literal (line 2).

Get hands-on with 1300+ tech skills courses.