How to represent strings in YAML

Strings

Strings are a series of characters, usually represented by letters, numbers, and symbols. They're used to communicate information in textual form.

If we want to represent a string with quotes, we can use either single or double-quotes. For example, both of the following are valid representations of the string "hello world":

# single-quoted scalars
'hello world'
# double-quoted scalars
"hello world"

If we want to represent a string without quotes, we can use the unquoted style. This is how we can represent the same string, "hello world", without using any quotes:

# unquoted scalars
hello world

The following are a few rules to follow when using the unquoted style:

1. The string can only contain alphanumeric characters and hyphens.

2. The string cannot start with a number or punctuation character.

3. The string cannot be a keyword in YAML.

Representing strings in YAML

In YAML, strings can be represented in the following several ways:

  1. Single-quoted scalars
  2. Double-quoted scalars
  3. Unquoted scalars
  4. Folded scalars
  5. Literal scalars

Let's look into each of these ways of representing strings in detail.

Single-quoted scalars

A single-quoted scalar is a basic string with any character except for the single quote ('). They cannot span multiple lines.

These are useful when we need to represent special characters in our strings. The example of single-quoted scalars is as follows:

'I\'m a YAML string!'
'I am a YAML string!'

Double-quoted string

A double-quoted scalar is a more sophisticated string that can have any character, span multiple lines, and use special characters such as the backslash (\).

These are useful when representing special characters in our strings. The example of double-quoted scalars is as follows:

"I'm a YAML string!"

Unquoted scalars

An unquoted scalar is a simple string that can have any character except for the following:

  • Special characters: ( { } , : [] & * ? | - > ' " % @ `)
  • The space character: ( )
  • The tab character: (\t)
  • The line feed character: (\n)
  • The carriage return character: (\r)
  • The null character: (\0)
  • The Unicode non-breaking space character: (\u00A0)

Unquoted scalars cannot span multiple lines. These are useful when representing data that does not require any special characters. An example of unquoted scalars is as follows:

I'm a YAML string!

Folded scalars

The > character is used to denote folded scalars, which are strings that span multiple lines. The > character indicates that the following lines are part of the string until a blank line or another line starting with > is reached. An example of folded scalars is as follows:

> This is a folded string
spanning multiple lines.

Literal scalars

The | character is used to denote literal scalars, which are strings that preserve all whitespace and line breaks. The | character indicates that the following lines are part of the string until a blank line is reached. An example of literal scalars is as follows:

| This is a literal string
spanning multiple lines.

The |+ character denotes a non-specific tag for a string that preserves all newlines and leading/trailing whitespace.

The |- character denotes a non-specific tag for a string that removes all trailing empty lines and spaces and leads spaces up to the first non-empty line.

Conclusion

We learned how to represent strings literals in YAML. We also learned about the different string styles and when to use them.