Chomsky Normal Form
Learn about Chomsky normal form (CNF) and the process of removing lambda and unit productions.
We'll cover the following...
Now we’ll turn our attention toward taking a deeper look at the closure and other properties of context-free languages, developing algorithms for making decisions about any context-free language, and finally, introducing a formal language that is not context-free, eventually leading to the third and final class of formal languages. We’ll begin with a procedure for converting any CFG into a special form, called Chomsky normal form (CNF), which will facilitate deriving certain results about context-free languages.
Preparation for Chomsky normal form
In preparation for Chomsky normal form, we need to make two modifications to any CFG under consideration:
- Remove lambda from all rules without changing the language generated (except we may lose the empty string, which is okay).
- Remove unit productions, such as , while maintaining their effects.
There are practical reasons for removing lambda from a CFG. To process a string efficiently to see whether it is generated by a CFG, it is better if the intermediate sentential forms of the working derivation do not decrease in size. If lambda is a valid replacement for a variable, then the working derivation string can shrink, so it is possible that a working derivation can grow and shrink and grow and shrink repeatedly, which is inefficient. A grammar can maintain the effect of the empty string in its rules without explicitly using lambda so that the sentential forms don’t shrink. Such a grammar is called noncontracting. The only downside is that the modified grammar will no longer generate the empty string, but that is a special case not worth worrying about most of the time.
Removing lambda
To remove lambda while maintaining its contribution in forming nonempty strings, we first identify which variables can generate the empty string, whether directly or indirectly. Such variables are called nullable variables, and, once identified, we incorporate their nullable effects into the rules of the grammar, removing any explicit mention of lambda. The language recognized by the modified grammar will be , where represents the language recognized by the original grammar.
To illustrate this, we’ll remove lambda from the following grammar:
The variable is the only variable that goes to lambda directly, but is also nullable because of the production . To substitute the effects of both and being nullable, we change the grammar like this:
In the first line, we added the rule from substituting lambda for in the rule . Likewise, we obtain the rule from since is nullable. The goal is to eliminate lambda completely, so we don’t bother to substitute lambda for or in the last two lines. Having preserved the effects of the nullable variables, we now eliminate lambda:
Since the grammar did not generate the empty string to begin with, the latter grammar generates the same language as the original. If desired, we can further simplify this grammar by eliminating , since it is equivalent to ...