...

/

Networks for Sequence Data

Networks for Sequence Data

Learn about the role of neural networks in natural language processing.

In addition to image data, natural language text has been a frequent topic of interest in neural network research. However, unlike the datasets we’ve examined so far, language has a distinct order that is important to its meaning. Therefore, to accurately capture the patterns in language or time-dependent data, it is necessary to utilize networks designed for this purpose.

Recurrent neural networks (RNNs)

Let’s imagine we are trying to predict the next word in a sentence, given the words up until this point. A neural network that attempted to predict the next word would need to take into account not only the current word but a variable number of prior inputs. If we instead used only a simple feedforward MLP, the network would essentially process the entire sentence or each word as a vector.

This introduces the problem of either having to pad variable-length inputs to a common length and not preserving any notion of correlation (that is, which words in the sentence are more relevant than others in generating the next prediction) or only using the last word at each step as the input, which removes the context of the rest of the sentence and all the information it can provide. This kind of problem inspired the “vanilla” RNNLeCun, Y., Bengio, Y. & Hinton, (2015) G. Deep learning. Nature 521, 436–444. https://www.nature.com/articles/nature14539.epdf, which incorporates not only the current input but the prior step’s hidden state in computing a neuron’s output: 

One way to visualize this is to imagine each layer feeding recursively into the next timestep in a sequence. In effect, if we “unroll” each part of the sequence, we end up with a very deep neural network, where each layer shares the same weightsOlah (2015). Understanding LSTM Networks. colah's blog. Available from: https://colah.github.io/posts/2015-08-Understanding-LSTMs/ (shown in the figure belowMozer, M. C. (1995). A Focused Backpropagation Algorithm for Temporal Pattern Recognition. In Chauvin, Y.; Rumelhart, D. (eds.). Backpropagation: Theory, architectures, and applications. ResearchGate. Hillsdale, NJ: Lawrence Erlbaum Associates. pp. 137–169).

Press + to interact
The unrolled RNN
The unrolled RNN

The same difficulties that characterize training deep feedforward networks also apply to RNNs; gradients tend to die out over long distances using traditional activation functions (or explode if the gradients ...