Improving Sequential Models: Beam Search

Learn how beam search improves LSTM predictions.

As we saw earlier, the generated text can be improved. Now let’s see if beam search might help to improve the performance. The standard way to predict from a language model is by predicting one step at a time and using the prediction from the previous time step as the new input. In beam search, we predict several steps ahead before picking an input.

This enables us to pick output sequences that may not look as attractive if taken individually but are better when considered as a sequence. The way beam search works is by, at a given time, predicting mnm^n output sequences or beams. mm is known as the beam width, and nn is the beam depth. Each output sequence (or a beam) is nn bigrams predicted into the future. We compute the joint probability of each beam by multiplying individual prediction probabilities of the items in that beam. We then pick the beam with the highest joint probability as our output sequence for that given time step. Note that this is a greedy search, meaning that we’ll calculate the best candidates at each depth of the tree iteratively as the tree grows. Note that this search won’t result in the globally best beam. The figure below shows an example. We’ll indicate the best beam candidates (and their probabilities) with bold font and red arrows:

Get hands-on with 1400+ tech skills courses.