What is the CYK algorithm?

The CYK algorithm is a parsing algorithm for context-free grammar. It is used to check if a particular string can be derived from the language generated by a given grammar. It is also called the membership algorithm as it tells whether the given string is a member of the given grammar or not. It was independently developed by three Russian scientists named Cocke, Younger, and Kasami, hence the name CYK!

Basics

In a CYK algorithm, the structure of grammar should be in Chomsky normal form.

In addition, the CYK algorithm uses the dynamic programming or table filling algorithm.

The grammar will be in CNFChomsky Normal Form if each rule has one of the following forms:

  • ABCA\rightarrow BC (at most two variables on the right-hand side)
  • AA\rightarrow a (a single terminal on the right-hand side)
  • SØS\rightarrowØ (null string)

If the given Grammar is not in the CNF form, convert it to the CNF form before applying the CYK Algorithm.

The algorithm

In the CYK algorithm,

  • Construct a triangular table of the length of your given string.
Table for string 'w' that has length 3

  • Each row corresponds to the length of the substrings of the given word ‘w’:

    • The bottom row will have strings of length 1.
    • The second row from the bottom will have strings of length 2 and so on.
  • X(i,i)X(i,i) is a set of variables A such that AwiA \rightarrow w_{i} is a production of grammar G, where wiw_{i} is part of the given string ‘w’ or the whole string.

  • Compare at most n pairs of previously computed sets. For strings with a length of 2, compare two pairs, and for strings with a length of 3 compare three sets. In this way, the next sets (A) can be computed, which are derived from the given grammar as per the following formula:

    (X(i,i),X((i+1),j),(X(i,i+1),X(i+2,j))(X(i,j1),Xj)(X(i, i) , X((i+1), j ), (X(i, i+1) , X(i+2, j) ) … (X(i, j-1), Xj)

    In the equation, X(i,i)X(i, i) refers to the set of variables derived from the production rules.

Rule: If the top row consists of the starting variable of the grammar, then the given string can be derived from it.

Example

Let’s look at an example to get a better understanding of the algorithm.

Consider the following CNF grammar GG:

  • SAB  BCS \rightarrow AB \space | \space BC

  • ABA  aA \rightarrow BA \space | \space a

  • BCC  bB \rightarrow CC \space | \space b

  • CAB  aC \rightarrow AB \space | \space a

The given word w=baabaw=baaba.

Step 1: Constructing the triangular table

Triangular table for the word 'baaba'

Step 2: Populating the table

  • The first row is computed simply by looking at the grammar to see which production is producing the string of length 1 for the given word ‘w’.

  • The second row is computed by comparing the two strings that were computed previously. So we can have w'w': ba'ba', aa'aa', ab'ab', and ba'ba'.

  • For the third row, there are three possible substrings of length three in the given word, namely w'w': baa'baa', aab'aab', and aba.'aba'. For the substring baa'baa', there are two possibilities: [b,aa]['b', 'aa'] and [ba,a].['ba', 'a'].

    Finding the sets to compute these: [b,aa]={B}{B}['b','aa'] = \{ B \}\{ B \}
    [ba,a]={S,A}['ba', 'a'] = \{ S, A \} {A,C}\{ A, C\}

    We take the union of these as follows:

{B}{B}U{S,A}{A,C}{BB}U{SA,SC,AA,AC}\{ B \}\{ B \} U \{ S, A \}\{ A,C \} \{ BB \} U \{ SA, SC, AA, AC \}

{BB,SA,SC,AA,AC},\rightarrow \{ BB, SA, SC, AA, AC \},

Then, we look in the grammar rules for populating the table.

  • We’ll repeat this for all substrings with lengths greater than 33.

The illustration below shows how the table is populated.

1 of 10

This is the final Triangular table.

Step 3: Check the table

Finally, we need to see if the given word is in the language of grammarLanguage of Grammar.

Yes. As we can see in the table, the cell X(1, 5) = (S, A, C). S is present in the final box. So, we can say that 'baaba' Є L(G)Language of Grammar..

Time complexity

The running time of the algorithm is O(n3).

Free Resources

Copyright ©2024 Educative, Inc. All rights reserved