Home/Blog/Programming/Properties of regular languages

Properties of regular languages

12 min read

Mar 18, 2024

content

Closure properties of regular languages

Reverse of a regular language

Concatenation of two regular languages

Kleene closure

Complement of a regular language

Union of two regular languages

Intersection of two regular languages

Pumping lemma

Proving a language to be regular

Proving a language to be nonregular

Example 1

Example 2

Become a Software Engineer in Months, Not Years

From your first line of code, to your first day on the job — Educative has you covered. Join 2M+ developers learning in-demand programming skills.

$L_1 = \{ \text{All strings that end with a } b \}$
$L_2 = \{ \text{All strings that contain } bab \text{ as the substring} \}$
$L_3 = \{ \text{All strings of even length} \}$
$L_4 = \{ \text{All strings having a length not more than } 10 \}$

In this blog, we discuss the closure properties of regular languages and pumping lemma.

Regular languages are closed under the reversal operation. The reverse of a string, $w$ , is notated as $w^R$ . If $w \in L$ , then $w^R \in L^R$ . If $L$ is a regular language, then there exists an NFA to recognize $L$ . An NFA can be constructed to recognize $L^R$ if an NFA for $L$ is given using the following procedure:

Create a new state and add $\epsilon$ -transitions from the final states of the NFA to the new state.
Make the existing final states nonfinal.
Make the existing start state of the NFA the final state of the new NFA.
Make the newly added state the start state of the NFA being constructed.
Finally, invert the direction of all the transitions.

The NFA that recognizes $L$ is shown below:

Regular languages are closed under the concatenation operation. If two languages are regular then there exist two NFAs to recognize these languages. We create a new NFA having $\epsilon$ -transition(s) from the final state(s) of the first NFA to the start state of the second NFA and remove the final status of all final states of the first NFA. The initial state of the second NFA will no longer be the initial state of the resultant NFA. The resultant NFA recognizes the concatenation of two given languages whose NFAs have been used to construct the new NFA.

Regular languages are closed under Kleene closures. The Kleene closure of a language (or a set) is defined as follows:

L^* = \bigcup_{i\geq0}L^i=L^0 \cup L^1\cup L^2 \cup L^3\cup ...

Here,

L^0={\epsilon}

and

L^1=L

$L^i$ is recursively defined as follows:

L^i=\{uw:u \in L^{i-1} \text{ and }w \in L\} \text{ for }i\geq2

The following figure presents a DFA for a regular language $L$ :

Regular languages are closed under the complement operation. If a language is regular, then there exists a DFA to recognize it. All strings whose consumption ends up in a final state are part of the language. And all strings whose consumption ends up in a nonfinal state are part of the complement of the language. So, to construct a DFA for the complement, all final states are made nonfinal and vice versa. The resulting DFA recognizes the complement of the language. The following figure represents a DFA recognizing a regular language.

Regular languages are closed under the intersection operation. The intersection of two languages (or sets) is defined as follows using DeMorgan’s law:

L_1 \cap L_2 = \overline{\overline{L_1}\cup \overline{L_2}}

Let’s deconstruct the expression above:

If $L_1$ and $L_2$ are regular languages, then $\overline{L_1}$ and $\overline{L_2}$ are regular because regular languages are closed under the complement operation.
If $\overline{L_1}$ and $\overline{L_2}$ are regular, then $\overline{L_1} \cup \overline{L_2}$ is regular because regular languages are closed under the union operation.
If $\overline{L_1} \cup \overline{L_2}$ is regular, then $\overline{\overline{L_1}\cup \overline{L_2}}$ is regular because regular languages are closed under the complement operation.
If $\overline{\overline{L_1}\cup \overline{L_2}}$ is regular, then $L_1 \cap L_2$ is regular because they are equal according to DeMorgan’s law.

Pumping lemma is an important and interesting property of regular languages.

If a language is regular, then there exists a DFA that recognizes the language. Let’s notate the number of states of the DFA as $m$ where $m$ is a finite positive integer.

For an infinite regular language $L$ , the DFA must contain at least one nonempty loop. If $w \in L$ and $|w| \geq m$ , then $w$ can be written as follows:

w = xyz

such that $|xy| \leq m$ and $|y| \geq 1$ . Because $|w| \geq m$ and $w \in L$ , there must exist a walk from the start state to one of the final states of the DFA to consume $w$ . The number of states in the DFA is $m$ . So the longest walk without repeating a state can be of $m-1$ length. To consume $w$ , at least one of the states must be repeated, otherwise $w$ can’t be accepted. The repeating part of the walk makes the substring $y$ . The part of the string before the repeated part makes the substring $x$ , and the part of the string after the repeated part makes the substring $z$ . The following figure shows a breakdown of the substrings that make up a valid string $w$ to be accepted by the DFA:

The pumping lemma states that if $w=xyz$ and $w \in L$ , then $wy^iz \in L, \forall i \geq 0$ . It implies that if a walk from the start state to the final state is decomposed into three substrings, then there are two types of behaviors being observed: repeating parts of the walk and nonrepeating parts of the walk. Here, $y$ is the repeating part. If $y$ repeats itself $i$ times then the resulting walk from the start state to the final state remains part of the language. Therefore, all forms of the walk, where $y$ is repeating $i \geq 0$ times are part of the language. For instance, $xz$ , $xyz$ , $xyyz$ , $xyyyz$ , and so on, will be part of the language.

A regular language can be proved to be regular using different methods including:

Construct a DFA or NFA
Write a regular expression
Use closure properties
Use an appropriate logical combination of these methods

For example, let’s prove that the following language is regular:

L=\{ w: w \in \{ a, b \}^* \text{ and the length of } w \text{ is not a multiple of 3} \}

One way to prove that $L$ is regular is as follows:

Construct a DFA for the language $L'$ defined as:

L'=\{ w: w \in \{ a, b \}^* \text{ and length of } w \text{ is a multiple of 3} \}

To prove a language to be non-regular, we need to find a string $w=xyz$ and an $i$ where $xyz \in L$ but $xy^iz \notin L$ , while fulfilling individual constraints on $w$ and its substrings. This implies that the given statement of the pumping lemma is not true for all $i$ .

Let’s try to understand how to prove that a language is not regular. One effective approach is to apply the pumping lemma. If $L$ is a regular infinite language, $w=xyz \in L$ , $|w| \geq m, |xy| \leq m$ , and $|y| \geq 1$ , then $xy^iz \in L, \forall i \geq 0$ .

Our task is to prove that the following language is not regular:

L_5=\{a^nb^n:n \geq 0 \}

As $n \geq 0$ , therefore, $L_5$ is infinite. Assume that $L_5$ is regular. If it is regular then there exists a DFA that recognizes $L_5$ . The number of states in the (assumed) DFA will be a finite integer, say $m$ .

Let’s choose a string $w \in L_5$ , such that $|w| \geq m$ .

w=a^mb^m

Here, $|w|=2m \geq m$ . The chosen string $w$ can be decomposed into three substrings $x$ , $y$ , and $z$ such that $|xy| \leq m$ and $|y| \geq 1$ . Let’s represent the length of $y$ as $k$ , where $1 \leq k \leq m$ . In this string, $x$ and $y$ both have to fall within $a'$ s because their combined length can’t go beyond $m$ . Let’s repeat $y$ twice and create a new string $w'$ .

w'=xy^2z=xyyz

|w'|=|w|+|y|=m+k

Because $y$ is a composition of $k$ number of $a'$ s, the resulting string will be:

w' = a^{m+k}b^m \notin L_5

According to the pumping lemma, $w' \in L_5$ , but we 've demonstrated that $w' \notin L_5$ . Therefore, our assumption is incorrect and the given language is not regular.

Our task is to prove that the following language is not regular:

L_6=\{a^n:n \geq 0 \text{ and } n \text{ is a perfect square} \}

Because $n \geq 0$ , therefore, $L_6$ is infinite. Assume that $L_6$ is regular. If it’s regular, then there exists a DFA that recognizes $L_6$ . The number of states in the (assumed) DFA will be a finite integer, say $m$ .

Let’s choose a string $w \in L_6$ , such that $|w| \geq m$ .

w=a^{m^2}

Here, $|w|=m^2 \geq m$ . The chosen string $w$ can be decomposed into three substrings, $x$ , $y$ , and $z$ , such that $|xy| \leq m$ and $|y| \geq 1$ . Let’s represent the length of $y$ as $k$ where $1 \leq k \leq m$ . In this string, $x$ and $y$ both have to fall within the first $m$ number of $a$ 's because their combined length can’t go beyond $m$ . Let’s repeat $y$ twice and create a new string $w'$ .

w'=xy^iz=xyyz

|w'|=|w|+|y|=m^2+k \leq m^2 + m

The lower limit on the length of the new string $w'$ is $m^2 + 1$ . The upper limit on the length of the new string $w'$ is $m^2+m$ .

m^2 + m < m^2+2m+1 = (m+1)^2

Even the upper bound of $|w'|$ falls below the next perfect square, i.e. $(m+1)^2$ .

According to the pumping lemma, $w' \in L_6$ , but we’ve demonstrated that $w' \notin L_6$ . Therefore, our assumption is incorrect and the given language is not regular.

For an in-depth understanding of different classes of computational problems and their characterization, you can explore the following course:

Theory of Computation

Theory of Computation

What are the mathematics behind computers? What are the theoretical foundations of computer languages? Such questions can be explored by understanding formal languages and automata models. In this comprehensive course, you’ll explore formal languages, regular languages, and how to model them with their associated automata models. Next, you’ll cover regular expressions and grammar, and their equivalents. Then, you’ll explore context-free languages (the foundations of programming languages), pushdown automata, and the relationship between them. Toward the end, you’ll learn about recursively enumerable languages, Turing machines with their variations, context-sensitive languages, and unrestricted grammars. By the end of the course, you’ll have gained a deep understanding of several formal languages and their associated automata. Also, since there are plenty of exercises throughout the course, your understanding and problem-solving skills will be thoroughly reinforced.

15hrs

Beginner

10 Playgrounds

9 Quizzes

Written By:

Malik Jahan