Regular Expressions
Learn how to use regular expressions to describe regular languages.
We'll cover the following
Regular expressions are a convenient notation for representing strings that match simple text patterns. The expression illustrates two of the four regular expression operations, namely concatenation (via juxtaposition) and Kleene star. The others appear in the following recursive definition.
Formal definition
-
These are regular expressions:
a) , representing the empty language/set,
b) , representing the one-string language/set,
c) , for each character, in the alphabet , representing the language/set
-
For regular expressions , the following are also regular expressions:
a) (union)
b) (concatenation)
c) (Kleene star)
d) (grouping)
Note: Some authors (and most programming environments) use the notation for union.
Each regular expression represents a language, which is the set of all strings matching its pattern. Formally, we denote the language associated with a regular expression, , as , so , , and for each symbol from the alphabet in use. The following expressions also hold:
The above definition is a constructive definition. For example, using , we can construct the regular expression as:
- , , and are regular expressions by rule 1–c.
- , , and are regular expressions by rule 2–c.
- is a regular expression by rule 2–b (applied three times).
We now take and construct a regular expression representing the language of ’s and ’s that contain the substring . We don’t care what precedes or follows the substring , so a regular expression for this language is .
The formal construction satisfying the definition is:
- and are regular expressions by rule 1–c.
- is a regular expression by rule 2–b (applied twice).
- is a regular expression by rule 2–a.
- is a regular expression by rule 2–d.
- is a regular expression by rule 2–c.
- is a regular expression by rule 2–b (applied twice).
Here are some more examples of regular expressions over the alphabet .
Regular Language | Regular Expression |
---|---|
Strings ending in | |
Strings that begin and end with different symbols | |
String of even length | |
Strings with an even number of ’s |
The precedence of regular expression operators parallels that of algebra:
- Grouping is the highest-precedence operator.
- Followed by Kleene star (like exponentiation).
- Followed by concatenation (like multiplication).
- Followed by union (like addition).
Note the subexpression in the last example in the above table. We know that two ’s must be introduced at a time to have an even number, but they don’t have to be consecutive, so we insert a between the two ’s.
Remember: Regular expressions constitute a pattern language for text processing.
While we are accustomed to using exponents as a shorthand for repetition, using variables as exponents is not allowed in regular expressions. For example, as a shorthand for is a regular expression because is a regular expression, but is not a regular expression.
Remember: The expression , where is a constant, is a regular expression; but , where is a variable, is not a regular expression!
Regular expressions also observe certain algebraic rules, such as a distributive law, wherein and . Order matters here, because what looks like multiplication is actually concatenation, so . The following table summarizes common rules for rewriting regular expressions.
Note: The quality of regular expressions is used to represent the equality of the corresponding languages. When we write , instead of , we mean that .
Rule | Description |
---|---|
Union is commutative | |
Union is associative | |
Union is idempotent | |
Union’s identity element is | |
Concatenation is associative | |
Concatenation’s identity element is | |
Concatenation’s nullity element is | |
Concatenation over union is distributive | |
Concatenation over union is distributive | |
Kleene star is idempotent |
Apart from the above-mentioned rules, there are some additional rules for the regular expressions mentioned below:
Get hands-on with 1300+ tech skills courses.