Home/Blog/Data Science/Deterministic finite automata

Deterministic finite automata

11 min read

Jan 24, 2024

content

Characteristics of a DFA

Notations

Examples

Example 1: Binary strings of odd length

Example 2: Counting characters of each type

Example 3: Recognizing integers divisible by 3

Applications of DFA

Way forward

In the theory of computation, a language is a set of strings composed of alphabets. There are different categories of languages. Each category represents a set of computational problems. Commonly-known categories of languages are listed below:

One primitive category is regular languages.

A regular language is one that can be represented by a regular expression or finite automata.

A deterministic finite automata (DFA) is an abstract mathematical model composed of the following components:

Set of alphabets $\Sigma$
Set of states $Q$
Set of final states $(F)$ such that $F \subseteq Q$
Dedicated start state $q_0$ where $q_0 \in Q$

It also contains a set of transition functions where each transition function $\delta: Q \times \Sigma \rightarrow Q$ is defined as:

\delta(q,a)=p

where $p, q \in Q$ and $a \in \Sigma$ . If we read $a$ while being at the state $q$ , the transition will take us to the state $p$ .

The language of the DFA is defined as the set of strings that can be accepted by the DFA. A string is considered an accepted string if a sequence of transition functions can lead from $q_0$ to one of the final states of the DFA by reading each character of the string from left to right. The acceptance can be notated using an extended transition function $\delta^*$ , as shown below:

\delta^*(q_0,w) \in F

Here, $w$ is a string composed of the characters of $\Sigma$ . The language comprises all strings $w \in \Sigma^*$ for which $\delta^*(q_0,w) \in F$ .

A DFA is defined as a tuple $(Q,\Sigma,\delta,q_0,F \subseteq Q$ ):

State: Each state $q \in Q$ is notated as a labeled circle.
Start state: The start state $q_0$ of the DFA is notated as a labeled circle having an unlabeled incoming edge (arrow).
Final state: Each final state $q' \in F$ is notated as a labeled double circle (circle in a circle).
Transition: Each transition is defined as a transition function $\delta: Q \times \Sigma \rightarrow Q$ notated as a labeled directed edge (arrow) going from a state to a state. The label has to be $a \in \Sigma$ .

Formally, the language will be described over $\Sigma=\{0,1\}$ as follows:

L_1=\{w \in \Sigma^*: |w|=2k+1 \text{ where }k\in\mathbb{Z}^+\}

$\Sigma^*$ notates a combination of alphabets of any size including zero. For example, $000$ , $110$ and $11111$ are members of $L_1$ because their length (number of characters) is odd.

To construct the deterministic finite automata for this language, we need to create two states. Each state is expected to take the responsibility of memorizing the odd and even lengths of the input. In other words, each of the states will be used to memorize $|w|\%2$ , i.e. $0$ or $1$ . We don’t have any external memory in DFAs. Therefore, the limited amount of information can be hardwired in the states. For our convenience, we can label the states as $q_{even}$ and $q_{odd}$ . With every new input character, the state will change from odd to even or even to odd. The resultant DFA will look like the one shown below:

Formally, the language will be described over $\Sigma=\{a,b\}$ as follows:

L_2=\{w \in \Sigma^* : \text{the number of } a \text{'s is even and the number of }b \text{'s is odd} \}

The language $L_2$ contains strings like $b$ , $aab$ , $aba$ , $bbbaa$ , $babab$ , etc.

There are two types of counters involved in the recognition of this language. After a transition, the number of $a'$ s could be even or odd. Similarly, after a transition the number of $b'$ s could be even or odd. At the start of the DFA recognizing this language, both of these counters will be even because the number of $a$ 's and the number of $b$ 's are zero at the start. In other words, the number of $a$ 's is even and the number of $b$ 's is even at the start state, i.e. $q_{even}$ . We have four such possible combinations that need to be modeled in the DFA. A combination with an even number of $a$ 's and an even number of $b$ 's will be represented by a state in the DFA notated as $q_{EE}$ , where $EE$ notates even and even. The first symbol in this notation represents the number of $a$ 's and the second symbol notates the number of $b$ 's. We model the DFA using this notation of the states and link them with the corresponding transitions as shown below.

The language containing integers that are divisible by 3 can be formally described as follows over $\Sigma=\{0,1,2,...,9\}$ :

L_3=\{w\in \Sigma^*: w \% 3 = 0 \text{ and }|w| \geq 1\}

This language is composed of non-empty strings that can contain any digit from $0$ to $9$ . It’s given that any integer (which is a string made up of characters of $\Sigma$ ) can be a multiple of $3$ or not. When it’s a multiple of $3$ then the remainder will be zero. When the given string isn’t a multiple of $3$ then the possible remainder can be either $1$ or $2$ . Memorizing the remainders is important in this context. So, each state is expected to represent a possible remainder when the string consumed till that state is divided by $3$ . We label these states as $0$ , $1$ , and $2$ to represent the respective remainders. Additionally, we need to have a separate start state $s$ that will take us to either of the states notating a remainder.

For example, if the starting digit of the input is $1$ , $4$ , or $7$ then its remainder (when divided by $3$ ) is $1$ . These three possible input characters should take us to the state $1$ . When at state $1$ , if the next digit is $0$ , it makes up the string $10$ , $40$ , or $70$ , depending on what the first character was. When divided by $3$ , its remainder remains $1$ . That’s why the loop with the labels of these possible input characters keeps us in the same state. But if we read $1$ , $4$ or $7$ after the first character, the second character will be appended with the remainder computed so far, i.e. $1$ . So, the input string of two characters will be of the form $11$ , $14$ or $17$ . When divided by 3, it will take us to state $2$ because the remainder of all these combinations when divided by $3$ is $2$ . Similarly, if the second input character is $2$ , $5$ , or $8$ then the resultant becomes $12$ , $15$ , or $18$ after appending with the remainder $1$ . Because its remainder when divided by $3$ is $0$ , it takes us to state $0$ . This logic is replicated across all possible transitions from all the states.

Theory of Computation

Theory of Computation

What are the mathematics behind computers? What are the theoretical foundations of computer languages? Such questions can be explored by understanding formal languages and automata models. In this comprehensive course, you’ll explore formal languages, regular languages, and how to model them with their associated automata models. Next, you’ll cover regular expressions and grammar, and their equivalents. Then, you’ll explore context-free languages (the foundations of programming languages), pushdown automata, and the relationship between them. Toward the end, you’ll learn about recursively enumerable languages, Turing machines with their variations, context-sensitive languages, and unrestricted grammars. By the end of the course, you’ll have gained a deep understanding of several formal languages and their associated automata. Also, since there are plenty of exercises throughout the course, your understanding and problem-solving skills will be thoroughly reinforced.

15hrs

Beginner

10 Playgrounds

9 Quizzes

Written By:

Malik Jahan