...
/Part-of-Speech Tagging Using the Viterbi Algorithm
Part-of-Speech Tagging Using the Viterbi Algorithm
Discover how the Viterbi algorithm performs POS tagging.
Introduction to the Viterbi algorithm
The hidden Markov model (HMM) finds the likely part of speech for a given word. However, for a sentence, we need to generate the most likely sequence of part of speech tags instead of a single word. For this task, we can use the Viterbi algorithm, a dynamic programming algorithm developed for finding the most likely sequence of hidden states, or a Viterbi path. The Viterbi algorithm does this by systematically "crawling" over all of the possible states of our HMM for each step in our sequence of words and uses these calculations to find the maximal set of states.
The algorithm can be split into three steps:
Initialization step
Forward pass
Backward pass
It uses the transition probabilities and emission probabilities from the HMM to calculate two matrices:
- The matrix C (maximal probabilities) holds the intermediate optimal probabilities.
- The matrix D (maximal transition path) holds the indices of the visited states.
Both are of size rows, where is the number of tags, and columns, where is the number of words in the sequence.
Viterbi setup
To explain the Viterbi algorithm, let's set up a small HMM:
Given a small set of part of speech tags: “Noun”, “Verb”, and “Adjective”, we want to find the most likely sequence of POS tags for the sentence: "The brown fox jumps." We will also add a state “Initial” here to represent the start of the sentence: "The brown fox jumps."
Here is our HMM's transition matrix (A):
Transition Matrix (A)
Noun | Verb | Adjective | |
Initial | 0.5 | 0.2 | 0.3 |
Noun | 0.3 | 0.5 | 0.2 |
Verb | 0.2 | 0.1 | 0.7 |
Adjective | 0.4 | 0.3 | 0.3 |
Similarly, the emission matrix (B) can be given as follows:
Emission Matrix (B)
The | brown | fox | jumps | |
Noun | 0.5 | 0.1 | 0.4 | 0 |
Verb | 0.1 | 0.1 | 0.1 | 0.7 |
Adjective | 0.2 | 0.6 | 0.1 | 0.1 |
With this setup, let's use the Viterbi algorithm to find the most likely sequence of POS tags.
Initializing the maximal probabilities matrix
During the initialization step, we fill in the first column in both our C and D matrix. The first column in C can be represented by the probability of transitioning from our “Initial” state to the first word. For this, we simply take our “Initial” state probabilities of each state from the first row of matrix ...