Introduction: Applications of LSTMs—Generating Text

Get an overview of the applications of LSTM networks.

Now that we have a good understanding of the underlying mechanisms of LSTMs, such as how they solve the problem of the vanishing gradient and update rules, we can look at how to use them in NLP tasks. LSTMs are employed for tasks such as text generation and image caption generation. For example, language modeling is at the core of any NLP task because the ability to model language effectively leads to effective language understanding. Therefore, this is typically used for pretraining downstream decision support NLP models. By itself, language modeling can be used to generate songs, movie scripts, etc.

Writing folk stories with LSTMs

The application that we’ll cover is building an LSTM that can write new folk stories. For this task, we’ll download translations of some folk stories by the Grimm brothers. We’ll use these stories to train an LSTM and then ask it to output a fresh new story. We’ll process the text by breaking it into character-level bigrams (n-grams where n=2n= 2) and make a vocabulary out of the unique bigrams. Note that representing bigrams as one-hot encoded vectors is very ineffective for machine learning models because it forces the model to treat each bigram as an independent unit of text that is entirely different from other bigrams. But bigrams do share semantics, where certain bigrams co-occur and where certain ones would not. One-hot encoding will ignore this important property, which is undesirable. To leverage this property in our modeling, we’ll use an embedding layer and jointly train it with the model.

We’ll also explore ways to implement previously described techniques, such as greedy sampling or beam search, for improving the quality of predictions. Afterward, we’ll see how we can implement time-series models other than standard LSTMs, such as GRUs.

Get hands-on with 1400+ tech skills courses.