Puzzle 5: Explanation
Let’s find out how strings and characters are counted in Rust.
We'll cover the following...
Test it out
Hit “Run” to see the code’s output.
const HELLO_WORLD : &'static str = "Halló heimur";fn main() {println!("{} is {} characters long.",HELLO_WORLD,HELLO_WORLD.len());}
Explanation
As the compiler above says, “Halló heimur”
contains characters (including the space). Let’s step back and look at how Rust’s String
type works.
The definition of an internal struct
of a String
is quite straightforward.
pub struct String {
vec: Vec<u8>,
}
Strings are just a vector of bytes (u8) that represent Unicode characters in an encoding called UTF-8. Rust automatically translates our strings to UTF-8.
The illustration below shows us what the encoding looks like:
Our original string, “Halló heimur”
, consists of 11 ASCII characters (including the space) and 1 Latin-1 Supplement character, ó. ASCII characters require one byte to encode. Latin Supplements require two bytes.
Rust’s string encoding is smart enough not to store extra zeros for each Unicode character. If it did, the string would be a vector of char
types. Rust’s char
is exactly four ...