Networks for Sequence Data

Learn about the role of neural networks in natural language processing.

In addition to image data, natural language text has been a frequent topic of interest in neural network research. However, unlike the datasets we’ve examined so far, language has a distinct order that is important to its meaning. Therefore, to accurately capture the patterns in language or time-dependent data, it is necessary to utilize networks designed for this purpose.

Recurrent neural networks (RNNs)

Let’s imagine we are trying to predict the next word in a sentence, given the words up until this point. A neural network that attempted to predict the next word would need to take into account not only the current word but a variable number of prior inputs. If we instead used only a simple feedforward MLP, the network would essentially process the entire sentence or each word as a vector.

This introduces the problem of either having to pad variable-length inputs to a common length and not preserving any notion of correlation (that is, which words in the sentence are more relevant than others in generating the next prediction) or only using the last word at each step as the input, which removes the context of the rest of the sentence and all the information it can provide. This kind of problem inspired the “vanilla” RNNLeCun, Y., Bengio, Y. & Hinton, (2015) G. Deep learning. Nature 521, 436–444. https://www.nature.com/articles/nature14539.epdf, which incorporates not only the current input but the prior step’s hidden state in computing a neuron’s output: 

Get hands-on with 1400+ tech skills courses.