Inference: Generating Sequences with GANs
Explore the difference between models trained on words vs. characters and sample the output.
We'll cover the following
In this lesson, we are going to consider both our experimental setups: one where the generator and discriminator predict and discriminate sequences of words, and another where the models predict and discriminate sequences on characters. Note that in both cases, there is no difference between the representation of a word or a character; they are just vectors in multidimensional space.
Assuming the same sequence length, the task of predicting a sequence of characters is harder than the task of predicting a sequence of words. Firstly, because in the character case, the model has to perform more predictions. Secondly, the overall entropy or uncertainty when predicting characters is higher than predicting words because it implies predicting a sequence of characters that form a word and predicting another sequence of characters that forms another word, and that is likely to be followed by the previously predicted word.
Although language is inherently sequential and words are, most of the time, samples conditioned on previous words, we are able to train an unconditional toy model of language with the GAN framework. Let’s look at the output of the models trained to predict the sequence of words first.
Model trained on words
The code for sampling the generator is straightforward. First, we load the model, then create a z
vector, get the output from the generator that provides the probability of each—also known as logits—and sample it with the mode of the distribution, that is, by taking the index with the highest probability and converting it into a word by using our token_to_id
function:
Get hands-on with 1400+ tech skills courses.