What is the Viterbi algorithm?

Andrew Viterbi proposed the Viterbi algorithm in 1967. The Viterbi algorithm decodes convolution codes over noisy digital communication links and is used in various fields, including information theory, bioinformatics, speech processing, computational linguistics, and communication systems.

Given a sequence of the observed states $O = \{q_1, q_2, ..., q_t\}$ of a hidden Markov model (HMM), the Viterbi algorithm is a dynamic programming algorithm that finds the most probable sequence of hidden states in a Markov process. This sequence is also called a Viterbi path.

Here‘s a quick recap of some of the terminologies used in HMM models:

The start state represents the initial state of the model.
Observed or emitted states are each hidden state’s measurable or observable outcomes.
The hidden state represents a system’s unobservable or latent state.
Start probabilities represent the initial probabilities of being in each hidden state at the start of the sequence.
Transition probabilities describe the likelihood of transitioning from one hidden state to another.
The transition matrix (A) holds all transition probabilities between hidden states.
Emission probabilities define the probability distribution of observing a particular state given the current hidden state.
The emission matrix (B) captures all emission probabilities of each hidden state.

An example HMM

Suppose there are three markets: Food street, clothes market, and electronics market. Initially, we can move to any market with some initial probability. For this example, we assume these initial probabilities are equal. Assume the following transition matrix from one market to another market:

The Viterbi algorithm

The Viterbi algorithm uses the transition and emission probabilities from the HMM to find the maximal probabilities matrix ( $C$ ) to store the intermediate optimal probabilities, and the maximal transition path matrix ( $D$ ) to hold the indexes of the visited states. Then, at each step, we select the state with the maximum probability and keep track of the path leading to that state. Lastly, we backtrack through the matrix $D$ to find the most probable sequence of states.

Consider the observed state sequence as $\{\text{Clothes}, \text{Food}, \text{Clothes}, \text{Electronics}\}.$ Let’s find out the most likely hidden states using the Viterbi algorithm in the following three steps:

Initializing

In this step, we initialize and fill the first columns of matrices $C$ and $D$ . The first column of the matrix $C$ represents the transition probabilities from the initial state to the first observation. Since we are currently in the initial state, the first column of the matrix $D$ will be all zeros.

Forward pass

For each possible state $\{\text{Food Street, Clothes Market, Electronics Market}\}$ in the next observation state, we’ll fill the next column of the matrix $C$ with the multiplication of the previous state’s probability, the transition probability of the current state, and the emission probability of the current observation state. We’ll perform this for each previous state and record the maximum.

Backward pass

In the backward pass, we’ll use the matrix $C$ and $D$ to generate the most likely sequence of states for each observation state. We start from the last column of matrix $C$ and select the highest probability cell. Next, for the selected cell of matrix $C$ , we’ll select the corresponding cell in the matrix $D$ and start backtracking. Now, for each index we found in the column of matrix $D$ , we travel backward to the stored index until we find index 0.

Let’s understand this with the following slides:

For $\{\text{Clothes}, \text{Food}, \text{Clothes}, \text{Electronics}\}$ observing state sequence, the most likely hidden state sequence is $\{\text{Clothes Market}, \text{Food Street}, \text{Clothes Market}, \text{Electronics Market}\}$ .

Code example

Let’s implement the Viterbi algorithm in the following playground:

import numpy as np
def ViterbiAlgorithm(hmm, observations):
    # Initialization step
    matrixC = np.zeros((len(observations), len(hmm['states'])))
    obs_ind = hmm['emissions'].index(observations[0])
    matrixC[0] = hmm['initial_probs'] * hmm['matrixB'][:, obs_ind]
    matrixD = np.ones((len(observations), len(hmm['states']))) * -1
    # Forward pass
    for curr_obs in range(1, len(observations)):
        obs_ind = hmm['emissions'].index(observations[curr_obs])
        for curr_stat in range(len(hmm['states'])):
            curr_stat_probs = matrixC[curr_obs-1] * hmm['matrixA'][:, curr_stat] * hmm['matrixB'][curr_stat, obs_ind]
            matrixC[curr_obs, curr_stat] = max(curr_stat_probs)
            matrixD[curr_obs, curr_stat] = np.argmax(curr_stat_probs)
    # Backward pass
    output = []
    state_ind = np.argmax(matrixC[-1])
    for curr_col in range(len(observations)-1, -1, -1):
        output.insert(0, hmm['states'][state_ind])
        state_ind = int(matrixD[curr_col][state_ind])
    return output
hmm = {
    'states': ['Food Street', 'Clothes Market', 'Electronics Market'],
    'initial_probs': np.array([0.33, 0.33, 0.33]),
    'matrixA': np.array([[0.1, 0.6, 0.3],
                         [0.45, 0.15, 0.4],
                         [0.4, 0.4, 0.2]]),
    'emissions': ['Food', 'Clothes', 'Electronics'],
    'matrixB': np.array([[0.7, 0.2, 0.1],
                         [0.25, 0.65, 0.1],
                         [0.15, 0.1, 0.75]])
}
observations = ['Clothes', 'Food', 'Clothes', 'Electronics']
output_seq = ViterbiAlgorithm(hmm, observations)
print('Observations --> Most likely state')
print('----------------------------------')
for obs, state in zip(observations, output_seq):
    print(obs, ' --> ', state)

Code explanation

Let’s explain the code above in detail:

Lines 3–25: We define the ViterbiAlgorithm() function that receives two parameters. The hmm dictionary has states, initial probabilities, transition matrix, emissions states, and emission matrix. The observation list contains the observed state sequences. Inside this function:
- Lines 5–8: We initialize the matrixC as a NumPy array of dimensions of observations and states to store maximal probabilities. We store the index of the first observation in the emissions list in obs_ind. We multiply the initial probabilities with the emission probabilities of the first observation state and updated the first column of matrixC. We also initialize the matrixD to store the state transitions.
- Lines 11–16: We implement the forward pass to fill the matrixC and matrixD. We start the outer loop over observations. For each observation, we use a nested loop to calculate the maximal probability of the current observation for each state. To calculate the maximal probability, we multiply the probabilities of the previous column of matrixC, the transition probability from the previous states to the current state, and the emission probability of the current observation on the current state. We store all these probabilities in curr_stat_probs and call the max() function to update the maximal probability of the current state in matrixC. Lastly, we call the np.argmax() method to store the state transition in matrixD.
- Lines 19–23: We define the output list to store the most likely sequence of states. We store the index of the maximum probability of the last column of matrixC in the state_ind. We started the backward loop over observations. Inside this function, we add the state to the start of the output list and access the current state cell of the current observation in matrixD to update the state_ind for the previous observation.
- Line 25: Lastly, we return the output list.
Lines 28–39: We define a hmm dictionary with the information required for the Viterbi algorithm.
Line 40: We define the observation list with the observation states.
Line 41: We call the ViterbiAlgorithm() function and store the output in the output_seq list.
Lines 42–45: We use the for loop to show the output in the format where for each observation, the most likely state is printed in front of the observation.

Time complexity

For each hidden state for one observation, we need to calculate the probability of every path coming from the hidden states of the previous observation. If there are $n$ hidden states, the time complexity for the calculation of the probabilities of one hidden state is $O(n)$ and for all hidden states of one observation is $O(n^2)$ . If there are $t$ number of observations, the total time complexity of the Viterbi algorithm is $O(t \times n^2)$ .

Applications

We can utilize the Viterbi algorithm in various domains. Let’s have a look over some real-life applications of the Viterbi algorithm.

Handwriting recognition: We can use the Viterbi algorithm to decode the sequences of observed pixel values into the most likely sequence of characters to recognize the handwriting. We can use different characters as states and pixel values as observations.
DNA sequence analysis: We can identify the most probable sequence of the nucleotides to observe the DNA sequence using the Viterbi algorithm. The nucleotides (A, T, C, G) can be treated as states, and the DNA sequence can be the observations.
Part-of-speech tagging: The Viterbi algorithm can be utilized to predict the POS for each word of the sentence. The POS categories (e.g., noun, verb, and others) are the states, and the words of the sentence are the observations.
Gesture recognition: We can employ the Viterbi algorithm to recognize the most likely sequence of gestures given the observed sensor data. The gesture types (e.g., tap, pinch, and swipe) are the states, and the sequence of sensor data is the observations.
Electrocardiogram (ECG) analysis: The Viterbi algorithm can assist in diagnosing cardiac conditions by identifying the ECG signal. The ECG signal can be used as observation, and the cardiac conditions are the states of the model.

Limitations

The Viterbi algorithm is a powerful tool for different kinds of applications, but still, we need to consider the following limitations before using it:

Computational complexity: The time complexity of this algorithm is dependent on the number of hidden states and the length of the observation sequence. For larger sequences or a higher number of hidden states, it requires more computational powers.
Memory usage: Storing matrixes $C$ and $D$ with the dimension of the number of states by the number of observations requires proportional memory. For larger sequences or a higher number of hidden states, it requires more memory.
Limited to one most likely path: The Viterbi algorithm only finds one most likely sequence of hidden states. There might be some other possible sequences with a lower probability but also considerable. If we are exploring more than one path in our problem, this can be a limitation.
Unknown elements: The Viterbi algorithm only deals with the known elements. For unknown elements, the behavior of the Viterbi algorithm is unexpected.

Test your understanding

Consider the following matrices:

Free Resources

Learn in-demand tech skills in half the time

PRODUCTS

Mock Interview

New

Courses

Skill Paths

Projects

Assessments

Transition Matrix (A)	Food Street	Clothes Market	Electronics Market
Initial State	0.33	0.33	0.33
Food Street	0.1	0.6	0.3
Clothes Market	0.45	0.15	0.4
Electronics Market	0.4	0.4	0.2

Emission Matrix (B)	Food	Clothes	Electronics
Food Street	0.7	0.2	0.1
Clothes Market	0.25	0.65	0.1
Electronics Market	0.15	0.1	0.75