...

/

Decision Algorithms

Decision Algorithms

Learn about the dynamic programming-based CYK algorithm to parse context-free languages.

The CYK algorithm for parsing CFLs

The most fundamental question to ask about a context-free language is whether a given string is in the language (this is the parsing problem). If a deterministic pushdown automata (DPDA) is available, then the answer is as simple as running a string through it, but many context-free languages are not deterministic. There happens to be an algorithm, however, that works for all context-free languages. This is where the Chomsky normal form for context-free grammars is useful.

Recall that rules in a CNF grammar come in only two forms: where the right-hand side of a rule is either a single terminal, or exactly two nonterminals. The algorithm we have in mind begins by matching symbols of the string to the first type of rule, inferring the variables that produce each symbol, and then looks for rules where those variables appear as right-hand sides, so we can trace back to where those rules in turn originated. Continuing in this manner, if we eventually reach the start variable, then we know the string is in the language.

This algorithm, sometimes referred to as the "CYK Algorithm,"In honor of the authors Cocke, Younger, and Kasami, who discovered the algorithm independently. uses a problem-solving technique called dynamic programming to obtain the result. Dynamic programming solves problems in stages and uses the results from previous stages to solve later ones. We’ll illustrate the algorithm by testing the string baaabaaa with the following CFG in Chomsky normal form:

We can easily see that this grammar does generate the string baaabaaa. In fact, the grammar is ambiguous and there are multiple, distinct parse trees for this string. But let’s get to the algorithm.

Stage 1

We begin by finding all the variables that can produce substrings of length 1, that is, that directly produce the terminal symbols aa or bb. For the grammar in question, we observe that bb comes from XX and aa is generated by the variables XX, YY, and AA, which we denote as follows:

Stage 2

We now examine how all substrings of length 2 can be generated. There are only two distinct substrings of length 2 in the test string, namely baba and aaaa. We now use the information in Stage 1 to see where each component of each substring of length 2 comes from:

The bb in baba comes from the variable XX, and the aa comes from XX ...

Access this course and 1400+ top-rated courses and projects.