...

/

Puzzle 5: Explanation

Puzzle 5: Explanation

Let’s find out how strings and characters are counted in Rust.

Test it out

Hit “Run” to see the code’s output.

Press + to interact
const HELLO_WORLD : &'static str = "Halló heimur";
fn main() {
println!("{} is {} characters long.",
HELLO_WORLD,
HELLO_WORLD.len()
);
}

Explanation

As the compiler above says, “Halló heimur” contains 1313 characters (including the space). Let’s step back and look at how Rust’s String type works.

The definition of an internal struct of a String is quite straightforward.


pub struct String { 
    vec: Vec<u8>,
}

Strings are just a vector of bytes (u8) that represent Unicode characters in an encoding called UTF-8. Rust automatically translates our strings to UTF-8.

The illustration below shows us what the encoding looks like:

Our original string, “Halló heimur”, consists of 11 ASCII characters (including the space) and 1 Latin-1 Supplement character, ó. ASCII characters require one byte to encode. Latin Supplements require two bytes.

Rust’s string encoding is smart enough not to store extra zeros for each Unicode character. If it did, the string would be a vector of char types. Rust’s char is exactly four ...