Double-quoted Strings Are Binaries

Learn about double-quoted strings and different Elixir libraries.

Introduction

Unlike single-quoted strings, the contents of a double-quoted string (dqs) are stored as a consecutive sequence of bytes in UTF-8 encoding. Clearly, this is more efficient in terms of memory and certain forms of access, but it does have two implications.

  • First, because UTF-8 characters can take more than a single byte to represent, the size of the binary is not necessarily the length of the string.
     iex> dqs = "∂x/∂y" 
     "∂x/∂y"
     iex> String.length dqs 
     5
     iex> byte_size dqs
     9
     iex> String.at(dqs, 0) 
     "∂"
     iex> String.codepoints(dqs) 
     ["∂", "x", "/", "∂", "y"] 
     iex> String.split(dqs, "/") 
     ["∂x", "∂y"]
    
  • Second, because we’re no longer using lists, we need to learn and work with the binary syntax alongside the list syntax in your code.

Strings and Elixir libraries

When Elixir library documentation uses the word “string” (and most of the time it uses the word “binary”), it means double-quoted strings. The String module defines functions that work with double-quoted strings. Let’s cover some of them below with examples.

  • at(str, offset) returns the grapheme at the given offset (starting at 0). Negative offsets count from the end of the string.

     iex> String.at("∂og", 0) 
     "∂"
     iex> String.at("∂og", -1) 
     "g"
    
  • capitalize(str) converts str to lowercase, and then capitalizes the first character.

     iex> String.capitalize "école" 
     "École"
     iex> String.capitalize "ÎÎÎÎÎ" 
     "Îîîîî"
    
  • codepoints(str) returns the codepoints in str.

     iex> String.codepoints("José's ∂øg")
     ["J", "o", "s", "é", "'", "s", " ", "∂", "ø", "g"]
    
  • downcase(str) converts str to lowercase.

     iex> String.downcase "ØRSteD"
     "ørsted"
    
  • duplicate(str, n) returns a string containing n copies of str.

     iex> String.duplicate "Ho! ", 3
     "Ho! Ho! Ho! "
    
  • first(str) returns the first grapheme from str.

     iex> String.first "∂og"
     "∂"
    
  • graphemes(str) returns the graphemes in the string. This is different from the codepoints function, which lists combining characters separately. The following example uses a combining diaeresis along with the letter e to represent ë. (It might not display properly on your reader.)

     iex> String.codepoints "noe\u0308l" 
     ["n", "o", "e", "̈", "l"]
     iex> String.graphemes "noe\u0308l" 
     ["n", "o", "ë", "l"]
    
  • jaro_distance returns a float between 0 and 1, indicating the likely similarity of two strings.

     iex> String.jaro_distance("jonathan", "jonathon") 
     0.9166666666666666
     iex> String.jaro_distance("josé", "john") 
     0.6666666666666666
    
  • last(str) returns the last grapheme from str.

     iex> String.last "∂og"
     "g"
    
  • length(str) returns the number of graphemes in str.

     iex> String.length "∂x/∂y"
     5
    
  • myers_difference returns the list of transformations needed to convert one string to another.

     iex> String.myers_difference("banana", "panama")
     [del: "b", ins: "p", eq: "ana", del: "n", ins: "m", eq: "a"]
    
  • next_codepoint(str) splits str into its leading codepoint and the rest, or nil if str is empty. This may be used as the basis of an iterator.

Note: When we run the below code, it’ll separate characters in ∂og string.

Get hands-on with 1400+ tech skills courses.