Double-quoted Strings Are Binaries
Learn about double-quoted strings and different Elixir libraries.
We'll cover the following
Introduction
Unlike single-quoted strings, the contents of a double-quoted string (dqs
) are stored as a consecutive sequence of bytes in UTF-8 encoding. Clearly, this is more efficient in terms of memory and certain forms of access, but it does have two implications.
- First, because UTF-8 characters can take more than a single byte to represent, the size of the binary is not necessarily the length of the string.
iex> dqs = "∂x/∂y" "∂x/∂y" iex> String.length dqs 5 iex> byte_size dqs 9 iex> String.at(dqs, 0) "∂" iex> String.codepoints(dqs) ["∂", "x", "/", "∂", "y"] iex> String.split(dqs, "/") ["∂x", "∂y"]
- Second, because we’re no longer using lists, we need to learn and work with the binary syntax alongside the list syntax in your code.
Strings and Elixir libraries
When Elixir library documentation uses the word “string” (and most of the time it uses the word “binary”), it means double-quoted strings. The String
module defines functions that work with double-quoted strings. Let’s cover some of them below with examples.
-
at(str, offset)
returns the grapheme at the given offset (starting at0
). Negative offsets count from the end of the string.iex> String.at("∂og", 0) "∂" iex> String.at("∂og", -1) "g"
-
capitalize(str)
convertsstr
to lowercase, and then capitalizes the first character.iex> String.capitalize "école" "École" iex> String.capitalize "ÎÎÎÎÎ" "Îîîîî"
-
codepoints(str)
returns the codepoints instr
.iex> String.codepoints("José's ∂øg") ["J", "o", "s", "é", "'", "s", " ", "∂", "ø", "g"]
-
downcase(str)
convertsstr
to lowercase.iex> String.downcase "ØRSteD" "ørsted"
-
duplicate(str, n)
returns a string containing n copies ofstr
.iex> String.duplicate "Ho! ", 3 "Ho! Ho! Ho! "
-
first(str)
returns the first grapheme fromstr
.iex> String.first "∂og" "∂"
-
graphemes(str)
returns the graphemes in the string. This is different from thecodepoints
function, which lists combining characters separately. The following example uses a combining diaeresis along with the lettere
to representë
. (It might not display properly on your reader.)iex> String.codepoints "noe\u0308l" ["n", "o", "e", "̈", "l"] iex> String.graphemes "noe\u0308l" ["n", "o", "ë", "l"]
-
jaro_distance
returns a float between0
and1
, indicating the likely similarity of two strings.iex> String.jaro_distance("jonathan", "jonathon") 0.9166666666666666 iex> String.jaro_distance("josé", "john") 0.6666666666666666
-
last(str)
returns the last grapheme fromstr
.iex> String.last "∂og" "g"
-
length(str)
returns the number of graphemes instr
.iex> String.length "∂x/∂y" 5
-
myers_difference
returns the list of transformations needed to convert one string to another.iex> String.myers_difference("banana", "panama") [del: "b", ins: "p", eq: "ana", del: "n", ins: "m", eq: "a"]
-
next_codepoint(str)
splitsstr
into its leading codepoint and the rest, ornil
ifstr
is empty. This may be used as the basis of an iterator.
Note: When we run the below code, it’ll separate characters in
∂og
string.
Get hands-on with 1400+ tech skills courses.