In the theory of computation, a language represents a computational problem. Languages are classified into different classes based on the inherent complexity and resources required to recognize/decide a language. These classes of languages include:
In this blog, we focus on context-free languages. A context-free language (CFL) is a language that can be recognized by context-free grammar (CFG) or non-deterministic push-down automata (NPDA). CFG and NPDA are equivalent in their computational capability and mutually convertible. This blog focuses only on the design of CFGs.
A context-free grammar (CFG) is defined as follows:
where the grammar is composed of the 4-tuple: is a set of productions, is a set of variables, is a set of alphabets, and is a designated start variable so that . Each production (or rule of replacement) is defined in the following format:
where , and can be any one of the following substitutions:
Let’s design CFG for the following language:
We decomposed the language into the following logical components:
The first production of the CFG, starting with the designated start symbol makes sure that two 0’s are part of any valid string of the language. At the same time, the production allows possible substitutions of 1’s before both 0’s, after both 0’s, and between two 0’s. The CFG for is composed of the following productions:
Let’s look at the derivation of an example string: . A derivation is a sequence of substitutions of the productions starting from the designated start symbol . The following steps show the derivation of this string using the CFG:
Note: Derivation is a logical sequence of productions of the CFG being applied to derive a given string. There can be more than one possible derivation for a given string if it belongs to the language of the CFG.
The derivation of a string can be visualized as a derivation tree. In a derivation tree, the root rode is the starting symbol of the CFG, i.e., . If has to be substituted with a production whose length is , then there are child nodes of . All terminals, including , become the leaf nodes. All internal nodes of the tree are variables of the CFG. Reading the leaf nodes from left to right outputs the derived string. A derivation tree for the string is shown below:
Let’s design CFG for the following language:
We decompose the language into the following parts:
The first production of the CFG has two variables. The first variable reads any number of ’s including zero number of ’s. The second variable maintains a balance and order between ’s and ’s. The CFG of is composed of the following productions:
Note: This CFG is one possible solution for the given language. Think of a variation of the solution such that your CFG uses only one variable to recognize this language.
Let’s look at the derivation of an example string: . The following steps show the derivation of this string using the CFG:
Derivation tree of is shown below:
Let’s design CFG for the following language:
The strings that are the same as that of their reversal, i.e., , are called palindromes. Example strings belonging to this language include , , , etc. We decompose the language into two types of string:
For strings of even length, both ends of the string should be reading the same character. The middle variable should be substituted with at the end to maintain even length. For strings of odd length, both ends of the string should be reading the same character and the middle variable should be substituted with a character of length 1 at the end to maintain an odd length. The middle character is going to stay common in and . We use the same method as that of even-length strings and, in the end, insert the required character at the midpoint to make it an odd-length string.
In the CFG, the start symbol will have five possibilities: two will be recursive possibilities to extend even-length strings. The last three possibilities will substitute , , or to conclude the derivation.
Let’s derive an example string from this CFG:
Derivation tree of is shown below:
Let’s design CFG for the following language:
In this language, all those strings are included that don’t contain consecutive 0’s or 1’s. Example strings include , , , etc. We decompose the language into four types of strings:
The CFG takes care of all these possibilities. In each possibility, it has to make sure that 0’s and 1’s alternate. Each possibility is represented by a variable as shown in the following CFG:
Let’s derive an example string using this CFG. We derive . This is an odd-length string starting with a .
Derivation tree of is shown below:
Let’s design a CFG for the following language:
Example strings belonging to this language include , , , , etc.
There are two types of strings in this language:
The CFG for this language takes care of these possibilities. Each possibility is represented by a variable:
Let’s derive an example string from this CFG:
The derivation tree of is shown below:
CFGs recognize context-free languages. Intuitively, all those languages that require a limited amount of memory in the form of a stack can be represented by a CFG. Stack limits the number of simultaneous comparisons due to its operations restricted to push and pop. At one time, two entities can be compared using a stack. One item is pushed temporarily on the stack. At the time of comparison with the next item, it has to be popped from the stack. Such comparisons are represented by CFG productions of the form . It sets the upper limit on the problem-solving capacity of a CFG.
We hope you enjoyed exploring the CFG with different examples. For a deep dive, please consider the following course:
Theory of Computation
What are the mathematics behind computers? What are the theoretical foundations of computer languages? Such questions can be explored by understanding formal languages and automata models. In this comprehensive course, you’ll explore formal languages, regular languages, and how to model them with their associated automata models. Next, you’ll cover regular expressions and grammar, and their equivalents. Then, you’ll explore context-free languages (the foundations of programming languages), pushdown automata, and the relationship between them. Toward the end, you’ll learn about recursively enumerable languages, Turing machines with their variations, context-sensitive languages, and unrestricted grammars. By the end of the course, you’ll have gained a deep understanding of several formal languages and their associated automata. Also, since there are plenty of exercises throughout the course, your understanding and problem-solving skills will be thoroughly reinforced.
Free Resources