Text Generation via SeqGAN

Explore the concept of text generation via SeqGAN and NLP.

In this chapter, we will work on GANs that directly generate sequential data, such as text and audio. While doing so, we will go through image-synthesizing models so that we can become familiar with NLP models quickly.

Teaching GANs how to tell jokes

When it comes to the generation of text, the biggest difference in terms of image generation is that text data is discrete while image pixel values are more continuous, though digital images and text are both essentially discrete. A pixel typically has 256 values, and slight changes in the pixels won’t necessarily affect the image’s meaning to us. However, a slight change in the sentence, even a single letter (for example, turning we into he), may change the whole meaning of the sentence. Also, we tend to have a higher tolerance bar for synthesized images compared to text. For example, if 90% of the pixels in the generated image of a dog are nearly perfect, we may have little trouble recognizing the dog because our brains are smart enough to automatically fill in the missing pixels. However, if we are reading a piece of news in which every one out of 10 words doesn’t make any sense, we will definitely find it hard to enjoy reading it. This is why text generation is hard, and there’s less remarkable progress in text generation than in image synthesis.

SeqGANYu, Lantao, Weinan Zhang, Jun Wang, and Yong Yu. "Seqgan: Sequence generative adversarial nets with policy gradient." In Proceedings of the AAAI conference on artificial intelligence, vol. 31, no. 1. 2017. was one of the first successful attempts at text generation with adversarial learning. In this chapter, we will walk through the design of SeqGAN, how to create our own vocabulary for NLP tasks, and how to train SeqGAN so that it can generate short jokes.

Press + to interact
AI code generator
AI code generator

Design of SeqGAN—GAN, LSTM, and RL

Like other GAN models, SeqGAN is built upon the idea of adversarial learning. Some major changes have to be made so that it can accommodate NLP tasks. For example, the generation network is built with LSTM instead of CNNs. Also, reinforcement learning is used to optimize discrete objectives, unlike the SGD-family methods that are used in GAN models usually.

Here, we will provide a quick introduction to LSTM and RL. However, we won’t go too deep into these topics since we want to focus on the adversarial learning part of the model.

A quick introduction to RNN and LSTM

Recurrent neural networks (RNNs) are designed to process sequential data such as text and audio. Their biggest difference to CNNs is that the weights in the hidden layers (that is, certain functions) are used repeatedly on multiple inputs, and the order of the inputs affects the final results of the functions. The typical design of an RNN can be seen in the following diagram:

Press + to interact
Basic computational units of a recurrent neural network
Basic computational units of a recurrent neural network

As we can see, the most distinctive characteristic of an RNN unit is that the hidden state, hth_t, has an outgoing connection pointing to itself. This self-loop is where the name “recurrent” comes from. Let’s say the self-loop is performed three times. The extended version of this computational unit is shown on the right in the preceding diagram. The computational process is expressed as follows:

Thus, after proper training, this RNN unit is capable of handling sequential data with a maximum length of 3. RNNs are widely used in voice recognition, natural language translation, language modeling, and image captioning. However, a critical flaw remains in RNN that we need to address with LSTM.

An RNN model assumes that a strong connection only exists between the neighboring inputs (for example, x1x_1 and x2x_2, as shown in the preceding diagram) and that the connections between the inputs that are far apart from each other are ignored (for example, x1x_1 and x3x_3). This becomes troublesome when we try to translate a long sentence into another language that has totally different grammatical rules, and we need to look through all the parts of the sentence to make sense of it.

LSTM (Long Short-Term Memory)Hochreiter, Sepp, and Jürgen Schmidhuber. "Long short-term memory." Neural computation 9, no. 8 (1997): 1735-1780. is used to preserve the long-term memory of sequential data and address the gradient explosion and vanishing issues in RNNs. Its computational process is illustrated in the following diagram:

Press + to interact
Computational process of LSTM
Computational process of LSTM

As we can see, an additional term, ...