The CYK algorithm is a parsing algorithm for context-free grammar. It is used to check if a particular string can be derived from the language generated by a given grammar. It is also called the membership algorithm as it tells whether the given string is a member of the given grammar or not. It was independently developed by three Russian scientists named Cocke, Younger, and Kasami, hence the name CYK!
In a CYK algorithm, the structure of grammar should be in Chomsky normal form.
In addition, the CYK algorithm uses the dynamic programming or table filling algorithm.
The grammar will be in
If the given Grammar is not in the CNF form, convert it to the CNF form before applying the CYK Algorithm.
In the CYK algorithm,
Each row corresponds to the length of the substrings of the given word ‘w’:
is a set of variables A such that is a production of grammar G, where is part of the given string ‘w’ or the whole string.
Compare at most n pairs of previously computed sets. For strings with a length of 2, compare two pairs, and for strings with a length of 3 compare three sets. In this way, the next sets (A) can be computed, which are derived from the given grammar as per the following formula:
In the equation, refers to the set of variables derived from the production rules.
Rule: If the top row consists of the starting variable of the grammar, then the given string can be derived from it.
Let’s look at an example to get a better understanding of the algorithm.
Consider the following CNF grammar :
The given word .
The first row is computed simply by looking at the grammar to see which production is producing the string of length 1 for the given word ‘w’.
The second row is computed by comparing the two strings that were computed previously. So we can have : , , , and .
For the third row, there are three possible substrings of length three in the given word, namely : , , and For the substring , there are two possibilities: and
Finding the sets to compute these:
We take the union of these as follows:
Then, we look in the grammar rules for populating the table.
The illustration below shows how the table is populated.
This is the final Triangular table.
Finally, we need to see if the given word is in the language of grammar
Yes. As we can see in the table, the cell X(1, 5) = (S, A, C). S is present in the final box. So, we can say that 'baaba' Є
The running time of the algorithm is O(n3).
Free Resources