Decision Algorithms
Learn about the dynamic programming-based CYK algorithm to parse context-free languages.
The CYK algorithm for parsing CFLs
The most fundamental question to ask about a context-free language is whether a given string is in the language (this is the parsing problem). If a deterministic pushdown automata (DPDA) is available, then the answer is as simple as running a string through it, but many context-free languages are not deterministic. There happens to be an algorithm, however, that works for all context-free languages. This is where the Chomsky normal form for context-free grammars is useful.
Recall that rules in a CNF grammar come in only two forms: where the right-hand side of a rule is either a single terminal, or exactly two nonterminals. The algorithm we have in mind begins by matching symbols of the string to the first type of rule, inferring the variables that produce each symbol, and then looks for rules where those variables appear as right-hand sides, so we can trace back to where those rules in turn originated. Continuing in this manner, if we eventually reach the start variable, then we know the string is in the language.
This algorithm, sometimes referred to as the
We can easily see that this grammar does generate the string . In fact, the grammar is ambiguous and there are multiple, distinct parse trees for this string. But let’s get to the algorithm.
Stage 1
We begin by finding all the variables that can produce substrings of length 1, that is, that directly produce the terminal symbols or . For the grammar in question, we observe that comes from and is generated by the variables , , and , which we denote as follows:
Stage 2
We now examine how all substrings of length 2 can be generated. There are only two distinct substrings of length 2 in the test string, namely and . We now use the information in Stage 1 to see where each component of each substring of length 2 comes from:
The in comes from the variable , and the comes from ...