...

/

Part-of-Speech Tagging Using the Viterbi Algorithm

Part-of-Speech Tagging Using the Viterbi Algorithm

Discover how the Viterbi algorithm performs POS tagging.

Introduction to the Viterbi algorithm

The hidden Markov model (HMM) finds the likely part of speech for a given word. However, for a sentence, we need to generate the most likely sequence of part of speech tags instead of a single word. For this task, we can use the Viterbi algorithm, a dynamic programming algorithm developed for finding the most likely sequence of hidden states, or a Viterbi path. The Viterbi algorithm does this by systematically "crawling" over all of the possible states of our HMM for each step in our sequence of words and uses these calculations to find the maximal set of states.

The algorithm can be split into three steps:

  • Initialization step

  • Forward pass

  • Backward pass

It uses the transition probabilities and emission probabilities from the HMM to calculate two matrices:

  • The matrix C (maximal probabilities) holds the intermediate optimal probabilities.
  • The matrix D (maximal transition path) holds the indices of the visited states.

Both are of size nn rows, where nn is the number of tags, and kk columns, where kk is the number of words in the sequence.

Viterbi setup

To explain the Viterbi algorithm, let's set up a small HMM:

Given a small set of part of speech tags: “Noun”, “Verb”, and “Adjective”, we want to find the most likely sequence of POS tags for the sentence: "The brown fox jumps." We will also add a state “Initial” here to represent the start of the sentence: "The brown fox jumps."

Here is our HMM's transition matrix (A):

Transition Matrix (A)


Noun

Verb

Adjective

Initial

0.5

0.2

0.3

Noun

0.3

0.5

0.2

Verb

0.2

0.1

0.7

Adjective

0.4

0.3

0.3

Similarly, the emission matrix (B) can be given as follows:

Emission Matrix (B)


The

brown

fox

jumps

Noun

0.5

0.1

0.4

0

Verb

0.1

0.1

0.1

0.7

Adjective

0.2

0.6

0.1

0.1

With this setup, let's use the Viterbi algorithm to find the most likely sequence of POS tags.

Initializing the maximal probabilities matrix

During the initialization step, we fill in the first column in both our C and D matrix. The first column in C can be represented by the probability of transitioning from our “Initial” state to the first word. For this, we simply take our “Initial” state probabilities of each state from the first row of matrix ...