Home/Blog/Programming/Properties of regular languages
Home/Blog/Programming/Properties of regular languages

Properties of regular languages

12 min read
Mar 18, 2024
content
Closure properties of regular languages
Reverse of a regular language
Concatenation of two regular languages
Kleene closure
Complement of a regular language
Union of two regular languages
Intersection of two regular languages
Pumping lemma
Proving a language to be regular
Proving a language to be nonregular
Example 1
Example 2

Become a Software Engineer in Months, Not Years

From your first line of code, to your first day on the job — Educative has you covered. Join 2M+ developers learning in-demand programming skills.

A regular language is a language that can be represented by finite automata. It can be deterministic finite automata (DFA) or non deterministic finite automata (NFA). DFAs and NFAs are equivalent to each other in terms of their computational power. DFAs are NFAs by default. NFAs can be converted into their equivalent DFAs.

The following are examples of regular languages over a set of alphabets Σ={a,b}\Sigma=\{a,b\}:

  • L1={All strings that end with a b}L_1 = \{ \text{All strings that end with a } b \}
  • L2={All strings that contain bab as the substring}L_2 = \{ \text{All strings that contain } bab \text{ as the substring} \}
  • L3={All strings of even length}L_3 = \{ \text{All strings of even length} \}
  • L4={All strings having a length not more than 10}L_4 = \{ \text{All strings having a length not more than } 10 \}

In this blog, we discuss the closure properties of regular languages and pumping lemma.

Closure properties of regular languages#

Regular languages are closed under many set-theoretic operations including reversal, concatenation, Kleene closure, complement, union, and intersection. Let’s discuss and prove why regular languages are closed under all these properties.

Reverse of a regular language#

Regular languages are closed under the reversal operation. The reverse of a string, ww, is notated as wRw^R. If wLw \in L, then wRLRw^R \in L^R. If LL is a regular language, then there exists an NFA to recognize LL. An NFA can be constructed to recognize LRL^R if an NFA for LL is given using the following procedure:

  1. Create a new state and add ϵ\epsilon-transitions from the final states of the NFA to the new state.
  2. Make the existing final states nonfinal.
  3. Make the existing start state of the NFA the final state of the new NFA.
  4. Make the newly added state the start state of the NFA being constructed.
  5. Finally, invert the direction of all the transitions.

The NFA that recognizes LL is shown below:

NFA for L
NFA for L

The resultant NFA that recognizes LRL^R is shown below.

NFA for reversal of L
NFA for reversal of L

Concatenation of two regular languages#

Regular languages are closed under the concatenation operation. If two languages are regular then there exist two NFAs to recognize these languages. We create a new NFA having ϵ\epsilon-transition(s) from the final state(s) of the first NFA to the start state of the second NFA and remove the final status of all final states of the first NFA. The initial state of the second NFA will no longer be the initial state of the resultant NFA. The resultant NFA recognizes the concatenation of two given languages whose NFAs have been used to construct the new NFA.

Concatenation of two regular languages
Concatenation of two regular languages

Kleene closure #

Regular languages are closed under Kleene closures. The Kleene closure of a language (or a set) is defined as follows:

L=i0Li=L0L1L2L3...L^* = \bigcup_{i\geq0}L^i=L^0 \cup L^1\cup L^2 \cup L^3\cup ...

Here,

L0=ϵL^0={\epsilon}

and

L1=LL^1=L

LiL^i is recursively defined as follows:

Li={uw:uLi1 and wL} for i2L^i=\{uw:u \in L^{i-1} \text{ and }w \in L\} \text{ for }i\geq2

The following figure presents a DFA for a regular language LL:

DFA for a regular language
DFA for a regular language

Its Kleene closure (LL^*) is computed by the construction of the NFA, as follows:

  • Add ϵ\epsilon-transition(s) from the start state to the final state(s).
  • Add ϵ\epsilon-transition(s) from the final state(s) to the start state.
NFA for the Kleene closure
NFA for the Kleene closure

Complement of a regular language#

Regular languages are closed under the complement operation. If a language is regular, then there exists a DFA to recognize it. All strings whose consumption ends up in a final state are part of the language. And all strings whose consumption ends up in a nonfinal state are part of the complement of the language. So, to construct a DFA for the complement, all final states are made nonfinal and vice versa. The resulting DFA recognizes the complement of the language. The following figure represents a DFA recognizing a regular language.

DFA for a regular language
DFA for a regular language

The following figure presents the DFA that recognizes the complement of the given language.

DFA for the complement of the given regular language
DFA for the complement of the given regular language

Union of two regular languages#

Regular languages are closed under the union operation. If two languages are regular then there exist two NFAs to recognize these languages. We create a new NFA having ϵ\epsilon-transitions from its start state to the start states of the given NFAs. The resultant NFA recognizes the union of two given languages whose NFAs have been used to construct the new NFA.

Union of two regular languages
Union of two regular languages

Intersection of two regular languages#

Regular languages are closed under the intersection operation. The intersection of two languages (or sets) is defined as follows using DeMorgan’s law:

L1L2=L1L2L_1 \cap L_2 = \overline{\overline{L_1}\cup \overline{L_2}}

Let’s deconstruct the expression above:

  • If L1L_1 and L2L_2 are regular languages, then L1\overline{L_1} and L2\overline{L_2} are regular because regular languages are closed under the complement operation.
  • If L1\overline{L_1} and L2\overline{L_2} are regular, then L1L2\overline{L_1} \cup \overline{L_2} is regular because regular languages are closed under the union operation.
  • If L1L2\overline{L_1} \cup \overline{L_2} is regular, then L1L2\overline{\overline{L_1}\cup \overline{L_2}} is regular because regular languages are closed under the complement operation.
  • If L1L2\overline{\overline{L_1}\cup \overline{L_2}} is regular, then L1L2L_1 \cap L_2 is regular because they are equal according to DeMorgan’s law.

Pumping lemma#

Pumping lemma is an important and interesting property of regular languages.

If a language is regular, then there exists a DFA that recognizes the language. Let’s notate the number of states of the DFA as mm where mm is a finite positive integer.

For an infinite regular language LL, the DFA must contain at least one nonempty loop. If wLw \in L and wm|w| \geq m, then ww can be written as follows:

w=xyzw = xyz

such that xym|xy| \leq m and y1|y| \geq 1. Because wm|w| \geq m and wLw \in L, there must exist a walk from the start state to one of the final states of the DFA to consume ww. The number of states in the DFA is mm. So the longest walk without repeating a state can be of m1m-1 length. To consume ww, at least one of the states must be repeated, otherwise ww can’t be accepted. The repeating part of the walk makes the substring yy. The part of the string before the repeated part makes the substring xx, and the part of the string after the repeated part makes the substring zz. The following figure shows a breakdown of the substrings that make up a valid string ww to be accepted by the DFA:

Composition of w into three substrings
Composition of w into three substrings

The pumping lemma states that if w=xyzw=xyz and wLw \in L, then wyizL,i0wy^iz \in L, \forall i \geq 0. It implies that if a walk from the start state to the final state is decomposed into three substrings, then there are two types of behaviors being observed: repeating parts of the walk and nonrepeating parts of the walk. Here, yy is the repeating part. If yy repeats itself ii times then the resulting walk from the start state to the final state remains part of the language. Therefore, all forms of the walk, where yy is repeating i0i \geq 0 times are part of the language. For instance, xzxz, xyzxyz, xyyzxyyz, xyyyzxyyyz, and so on, will be part of the language.

Proving a language to be regular#

A regular language can be proved to be regular using different methods including:

  • Construct a DFA or NFA
  • Write a regular expression
  • Use closure properties
  • Use an appropriate logical combination of these methods

For example, let’s prove that the following language is regular:

L={w:w{a,b} and the length of w is not a multiple of 3}L=\{ w: w \in \{ a, b \}^* \text{ and the length of } w \text{ is not a multiple of 3} \}

One way to prove that LL is regular is as follows:

  • Construct a DFA for the language LL' defined as:

L={w:w{a,b} and length of w is a multiple of 3}L'=\{ w: w \in \{ a, b \}^* \text{ and length of } w \text{ is a multiple of 3} \}

DFA for L'
DFA for L'
  • LL' is regular because it has a DFA that recognizes it. We know that regular languages are closed under the complement operations. Therefore, L=L\overline{L'}=L is also regular.

It's important to note here that pumping lemma can't be used to prove a language to be regular because we can't demonstrate that xyizL,i0xy^iz \in L, \forall i \geq0.

Proving a language to be nonregular#

To prove a language to be non-regular, we need to find a string w=xyzw=xyz and an ii where xyzLxyz \in L but xyizLxy^iz \notin L, while fulfilling individual constraints on ww and its substrings. This implies that the given statement of the pumping lemma is not true for all ii.

Let’s try to understand how to prove that a language is not regular. One effective approach is to apply the pumping lemma. If LL is a regular infinite language, w=xyzLw=xyz \in L, wm,xym|w| \geq m, |xy| \leq m, and y1|y| \geq 1, then xyizL,i0xy^iz \in L, \forall i \geq 0.

Let’s understand this through two examples:

Example 1#

Our task is to prove that the following language is not regular:

L5={anbn:n0}L_5=\{a^nb^n:n \geq 0 \}

As n0n \geq 0, therefore, L5L_5 is infinite. Assume that L5L_5 is regular. If it is regular then there exists a DFA that recognizes L5L_5. The number of states in the (assumed) DFA will be a finite integer, say mm.

Let’s choose a string wL5w \in L_5, such that wm|w| \geq m.

w=ambmw=a^mb^m

Here, w=2mm|w|=2m \geq m. The chosen string ww can be decomposed into three substrings xx, yy, and zz such that xym|xy| \leq m and y1|y| \geq 1. Let’s represent the length of yy as kk, where 1km1 \leq k \leq m. In this string, xx and yy both have to fall within aa's because their combined length can’t go beyond mm. Let’s repeat yy twice and create a new string ww'.

w=xy2z=xyyzw'=xy^2z=xyyz

w=w+y=m+k|w'|=|w|+|y|=m+k

Because yy is a composition of kk number of aa's, the resulting string will be:

w=am+kbmL5w' = a^{m+k}b^m \notin L_5

According to the pumping lemma, wL5w' \in L_5, but we 've demonstrated that wL5w' \notin L_5. Therefore, our assumption is incorrect and the given language is not regular.

Example 2#

Our task is to prove that the following language is not regular:

L6={an:n0 and n is a perfect square}L_6=\{a^n:n \geq 0 \text{ and } n \text{ is a perfect square} \}

Because n0n \geq 0, therefore, L6L_6 is infinite. Assume that L6L_6 is regular. If it’s regular, then there exists a DFA that recognizes L6L_6. The number of states in the (assumed) DFA will be a finite integer, say mm.

Let’s choose a string wL6w \in L_6, such that wm|w| \geq m.

w=am2w=a^{m^2}

Here, w=m2m|w|=m^2 \geq m. The chosen string ww can be decomposed into three substrings, xx, yy, and zz, such that xym|xy| \leq m and y1|y| \geq 1. Let’s represent the length of yy as kk where 1km1 \leq k \leq m. In this string, xx and yy both have to fall within the first mm number of aa's because their combined length can’t go beyond mm. Let’s repeat yy twice and create a new string ww'.

w=xyiz=xyyzw'=xy^iz=xyyz

w=w+y=m2+km2+m|w'|=|w|+|y|=m^2+k \leq m^2 + m

The lower limit on the length of the new string ww' is m2+1m^2 + 1. The upper limit on the length of the new string ww' is m2+mm^2+m.

m2+m<m2+2m+1=(m+1)2m^2 + m < m^2+2m+1 = (m+1)^2

Even the upper bound of w|w'| falls below the next perfect square, i.e. (m+1)2(m+1)^2.

According to the pumping lemma, wL6w' \in L_6, but we’ve demonstrated that wL6w' \notin L_6. Therefore, our assumption is incorrect and the given language is not regular.

For an in-depth understanding of different classes of computational problems and their characterization, you can explore the following course:

Theory of Computation

Cover
Theory of Computation

What are the mathematics behind computers? What are the theoretical foundations of computer languages? Such questions can be explored by understanding formal languages and automata models. In this comprehensive course, you’ll explore formal languages, regular languages, and how to model them with their associated automata models. Next, you’ll cover regular expressions and grammar, and their equivalents. Then, you’ll explore context-free languages (the foundations of programming languages), pushdown automata, and the relationship between them. Toward the end, you’ll learn about recursively enumerable languages, Turing machines with their variations, context-sensitive languages, and unrestricted grammars. By the end of the course, you’ll have gained a deep understanding of several formal languages and their associated automata. Also, since there are plenty of exercises throughout the course, your understanding and problem-solving skills will be thoroughly reinforced.

15hrs
Beginner
10 Playgrounds
9 Quizzes

Written By:
Malik Jahan
Join 2.5 million developers at
Explore the catalog

Free Resources