Home/Blog/Programming/What is a context-free language?
Home/Blog/Programming/What is a context-free language?

What is a context-free language?

9 min read
Apr 25, 2024

In the theory of computation, a language represents a computational problem. Languages are classified into different classes based on the inherent complexity and resources required to recognize/decide a language. These classes of languages include:

In this blog, we focus on context-free languages. A context-free language (CFL) is a language that can be recognized by context-free grammar (CFG) or non-deterministic push-down automata (NPDA). CFG and NPDA are equivalent in their computational capability and mutually convertible. This blog focuses only on the design of CFGs.

Context-free grammar (CFG)#

A context-free grammar (CFG) is defined as follows:

G:(V,Σ,P,S),G:(V, \Sigma, P, S),

where the grammar GG is composed of the 4-tuple: PP is a set of productions, VV is a set of variables, Σ\Sigma is a set of alphabets, and SS is a designated start variable so that SVS \in V. Each production (or rule of replacement) is defined in the following format:

XY,X \rightarrow Y,

where XVX \in V, and YY can be any one of the following substitutions:

  • Sequence of variables concatenated with each other
  • Sequence of alphabets concatenated with each other
  • Any finite combination of the above two formats that can be repeated if needed
  • ϵ\epsilon (empty string whose length is 00)

CFG Examples#

Example 1: Two 0’s in a string#

Let’s design CFG for the following language:

L1={w{0,1}: number of 0’s in w is 2}.L_1=\{w \in \{0,1\}^* : \text{ number of 0's in }w \text{ is 2}\}.

We decomposed the language into the following logical components:

  • 0’s can appear anywhere in the string of this language. Their count needs to be restricted to 2.
  • 1’s can appear anywhere in the string in any count (including zero number of 1’s).

The first production of the CFG, starting with the designated start symbol SS makes sure that two 0’s are part of any valid string of the language. At the same time, the production allows possible substitutions of 1’s before both 0’s, after both 0’s, and between two 0’s. The CFG for L1L_1 is composed of the following productions:

SX0X0XS \rightarrow X0X0X

X1XX \rightarrow 1X

Xϵ.X \rightarrow \epsilon.

Let’s look at the derivation of an example string: 0110101101. A derivation is a sequence of substitutions of the productions starting from the designated start symbol SS. The following steps show the derivation of this string using the CFG:

SX0X0Xϵ0X0Xϵ01X0XS \Rightarrow X0X0X \Rightarrow \epsilon 0X0X \Rightarrow \epsilon 01X0X

ϵ011X0Xϵ011ϵ0X\Rightarrow \epsilon 011X0X \Rightarrow \epsilon 011 \epsilon 0X

ϵ011ϵ01Xϵ011ϵ01ϵ01101.\Rightarrow \epsilon011 \epsilon 01X \Rightarrow \epsilon011 \epsilon 01 \epsilon \Rightarrow 01101.

Note: Derivation is a logical sequence of productions of the CFG being applied to derive a given string. There can be more than one possible derivation for a given string if it belongs to the language of the CFG.

The derivation of a string can be visualized as a derivation tree. In a derivation tree, the root rode is the starting symbol of the CFG, i.e., SS. If SS has to be substituted with a production whose length is kk, then there are kk child nodes of SS. All terminals, including ϵ\epsilon, become the leaf nodes. All internal nodes of the tree are variables of the CFG. Reading the leaf nodes from left to right outputs the derived string. A derivation tree for the string 0110101101 is shown below:

Derivation tree of 01101
Derivation tree of 01101

Example 2: a’s followed by an equal or less number of b’s#

Let’s design CFG for the following language:

L2={anbm:nm and n,m0}.L_2=\{a^nb^m: n \geq m \text{ and } n,m \geq0\}.

We decompose the language into the following parts:

  • We should maintain the order of the alphabet: aa’s precede bb’s.
  • With every bb, there must be a preceding aa.
  • When all bb’s are done, there should be a provision to add more aa’s if needed.

The first production of the CFG has two variables. The first variable reads any number of aa’s including zero number of aa’s. The second variable maintains a balance and order between aa’s and bb’s. The CFG of L2L_2 is composed of the following productions:

SXYS \rightarrow XY

XaXϵX \rightarrow aX | \epsilon

YaYbϵ.Y \rightarrow aYb | \epsilon.

Note: This CFG is one possible solution for the given language. Think of a variation of the solution such that your CFG uses only one variable to recognize this language.

Let’s look at the derivation of an example string: aaabbaaabb. The following steps show the derivation of this string using the CFG:

SXYaXYS \Rightarrow XY \Rightarrow aXY

aϵYaϵaYbaϵaaYbb\Rightarrow a\epsilon Y \Rightarrow a\epsilon aYb \Rightarrow a\epsilon aaYbb

aϵaaϵbbaaabb. \Rightarrow a\epsilon aa\epsilon bb \Rightarrow aaabb.

Derivation tree of aaabbaaabb is shown below:

Derivation tree of aaabb
Derivation tree of aaabb

Example 3: Palindrome#

Let’s design CFG for the following language:

L3={w=wR:w{a,b}}.L_3=\{w=w^R: w\in \{a,b\}^*\}.

The strings that are the same as that of their reversal, i.e., w=wRw=w^R, are called palindromes. Example strings belonging to this language include aabaaaabaa, baabbaab, bb, etc. We decompose the language into two types of string:

  • Strings of even length
  • Strings of odd length

For strings of even length, both ends of the string should be reading the same character. The middle variable should be substituted with ϵ\epsilon at the end to maintain even length. For strings of odd length, both ends of the string should be reading the same character and the middle variable should be substituted with a character of length 1 at the end to maintain an odd length. The middle character is going to stay common in ww and wRw^R. We use the same method as that of even-length strings and, in the end, insert the required character at the midpoint to make it an odd-length string.

In the CFG, the start symbol SS will have five possibilities: two will be recursive possibilities to extend even-length strings. The last three possibilities will substitute aa, bb, or ϵ\epsilon to conclude the derivation.

SaSabSbabϵ.S \rightarrow aSa | bSb|a|b|\epsilon.

Let’s derive an example string bababbabab from this CFG:

SbSbbaSabbabab.S \Rightarrow bSb \Rightarrow baSab\Rightarrow babab.

Derivation tree of bababbabab is shown below:

Derivation tree of babab
Derivation tree of babab

Example 4: Alternate 0’s and 1’s#

Let’s design CFG for the following language:

L4={w:w{0,1} with 0’s and 1’s appearing alternatively }.L_4=\{w: w\in \{0,1\}^* \text{ with 0's and 1's appearing alternatively }\}.

In this language, all those strings are included that don’t contain consecutive 0’s or 1’s. Example strings include 00, 101101, 01010101, etc. We decompose the language into four types of strings:

  • Strings of odd length starting with a 0
  • Strings of odd length starting with a 1
  • Strings of even length starting with a 0
  • Strings of even length starting with a 1

The CFG takes care of all these possibilities. In each possibility, it has to make sure that 0’s and 1’s alternate. Each possibility is represented by a variable as shown in the following CFG:

SABCDS \rightarrow A | B | C | D

A0W0A \rightarrow 0W | 0

W1AW \rightarrow 1A

B1X1B \rightarrow 1X | 1

X0BX \rightarrow 0B

C0YϵC \rightarrow 0Y | \epsilon

Y1CY \rightarrow 1C

D1ZϵD \rightarrow 1Z | \epsilon

Z0D.Z \rightarrow 0D.

Let’s derive an example string using this CFG. We derive 1010110101. This is an odd-length string starting with a 11.

SB1X10BS \Rightarrow B \Rightarrow 1X \Rightarrow 10B

101X1010B10101\Rightarrow 101X \Rightarrow 1010B \Rightarrow 10101

Derivation tree of 1010110101 is shown below:

Derivation tree of 10101
Derivation tree of 10101

Example 5: Number of c’s are the absolute difference between the number of a’s and b’s#

Let’s design a CFG for the following language:

L5={aibjck:k=ij,i,j,k0}.L_5=\{a^ib^jc^k: k=|i-j|, i, j, k \geq 0\}.

Example strings belonging to this language include aabcaabc, aabbbcaabbbc, abbbccabbbcc, bcbc, etc.

There are two types of strings in this language:

  • aibjck:k=ija^ib^jc^k: k=i-j
  • aibjck:k=jia^ib^jc^k: k=j-i

The CFG for this language takes care of these possibilities. Each possibility is represented by a variable:

SXYS \rightarrow X|Y

XaXcWX \rightarrow aXc | W

WaWbϵW \rightarrow aWb|\epsilon

YWZY \rightarrow WZ

ZbZcϵ.Z \rightarrow bZc|\epsilon.

Let’s derive an example string aabbbcaabbbc from this CFG:

SYWZaWbZaaWbbZS \Rightarrow Y \Rightarrow WZ \Rightarrow aWbZ \Rightarrow aaWbbZ

aaϵbbZaaϵbbbZcaaϵbbbϵcaabbbc.\Rightarrow aa\epsilon bbZ \Rightarrow aa\epsilon bbbZc \Rightarrow aa\epsilon bbb\epsilon c \Rightarrow aabbbc.

The derivation tree of aabbbcaabbbc is shown below:

Derivation tree of aabbbc
Derivation tree of aabbbc

Concluding remarks#

CFGs recognize context-free languages. Intuitively, all those languages that require a limited amount of memory in the form of a stack can be represented by a CFG. Stack limits the number of simultaneous comparisons due to its operations restricted to push and pop. At one time, two entities can be compared using a stack. One item is pushed temporarily on the stack. At the time of comparison with the next item, it has to be popped from the stack. Such comparisons are represented by CFG productions of the form PaPbP \rightarrow aPb. It sets the upper limit on the problem-solving capacity of a CFG.

We hope you enjoyed exploring the CFG with different examples. For a deep dive, please consider the following course:

Theory of Computation

Cover
Theory of Computation

What are the mathematics behind computers? What are the theoretical foundations of computer languages? Such questions can be explored by understanding formal languages and automata models. In this comprehensive course, you’ll explore formal languages, regular languages, and how to model them with their associated automata models. Next, you’ll cover regular expressions and grammar, and their equivalents. Then, you’ll explore context-free languages (the foundations of programming languages), pushdown automata, and the relationship between them. Toward the end, you’ll learn about recursively enumerable languages, Turing machines with their variations, context-sensitive languages, and unrestricted grammars. By the end of the course, you’ll have gained a deep understanding of several formal languages and their associated automata. Also, since there are plenty of exercises throughout the course, your understanding and problem-solving skills will be thoroughly reinforced.

15hrs
Beginner
10 Playgrounds
9 Quizzes


Written By:
Khawaja Muhammad Fahd
 
Join 2.5 million developers at
Explore the catalog

Free Resources