Equivalence of Regular Expressions and Regular Languages
Learn about the process of converting a regular expression to a regular language.
Regular expressions are hard to find
Not all languages lend themselves to the easy discovery of a regular expression to describe them. Consider a language consisting of an even number of both ’s and ’s. We know that each letter must occur in pairs but not necessarily in consecutive pairs. In addition, we can introduce the ’s and ’s independently—having no 's and/or 's is valid since zero is an even number. The answer is not easy to come up with by ad hoc means:
It is especially difficult to come up with a regular expression for the complement language of a given regular expression. Regular expressions describe what is in a language, not what is outside of it. Fortunately, we can convert any finite automaton to a regular expression and vice versa.
To show that regular expressions represent the class of regular languages, we’ll demonstrate the following:
- For every regular expression, there is an NFA (and, therefore, a DFA) that accepts the same language.
- For every finite automaton (DFA or NFA), there is a regular expression that describes the language the automaton accepts.
From regular expression to NFA
To show that there is an automaton for every regular expression, it is sufficient to exhibit an automaton for each component of the formal definition of regular expressions. The NFAs in the figure below represent the base cases.
We now need to show how to construct the union, concatenation, and Kleene star of arbitrary NFAs. The grouping operation is merely a notational convenience for regular expressions and is not part of the proof.
To construct the union of two NFAs, we create a new initial state and use lambda expressions to the start states of each of the NFAs, allowing a choice of which machine to use. If and are the initial states of the two NFAs in question, the following figure illustrates the process.
To construct the concatenation of two NFAs, we place a lambda transition from each accepting state of the first machine to the start state of the second and make those final states in the first machine nonfinal. Of course, the start state of the second machine is also no longer a start state. See the figure below, which assumes, without loss of generality, that and are the final states of the first machine and is the start state of the second.
To construct the Kleene star of an NFA involves two considerations:
- If the empty string is not already accepted, we must make it accepted.
- We must allow arbitrary “restarts” of the NFA.
If the initial state of an NFA is also an accepting state, the first consideration above is already fulfilled. If not, we can make it accept only if there are no incoming edges from other states. Otherwise, making the start state accepting would change the language of the machine. In such a case, we create a new accepting start state, followed by a lambda transition to the original start state.
To allow arbitrary repetition of a machine, we add lambda transitions from each accepting state to the original initial state, since repetition is just concatenation with itself.
This constitutes a constructive proof of the first part of Kleene’s theorem. Let’s see it in action by constructing an NFA for the regular expression . We begin with an NFA accepting the singleton string .
Next, we allow zero or more ’s (we are omitting a lambda transition to a separate machine here for simplicity).
Now we concatenate another .
Next, we allow the alternative of just accepting ...