Natural Language Generation with GANs

Get familiar with the challenges in natural language generation and the differences between discriminative and generative modeling methods.

Within the field of natural language understanding (NLU), natural language generation is one of the most challenging tasks in machine learning. Broadly speaking, it is easier to estimate the parameters of a discriminative model than a generative model. On Quora, Ian Goodfellow gives a good informal explanation that can be generalized to language:

Can you look at a painting and recognize it as being the Mona Lisa? You probably can. That’s discriminative modeling. Can you paint the Mona Lisa yourself? You probably can’t. That’s generative modeling.

The task of modeling language has been approached with rule-based and data-based models, including deep learning. As Ziang Xie has informally explained in his practical guide for neural text generation, deep learning models for neural text generation are very flexible and expressive at the price of being somewhat unpredictable and hard to control; conversely, rule-based models are predictable and easy to control but not very expressive nor flexible.

Until recently, a large body of natural language understanding models relied on n-gram language models. At scale, these models require prohibitive amounts of feature engineering and expert knowledge. Lately, approaching such problems with neural networks seems to be a viable end-to-end solution that reduces the dependency on feature engineering or on domain experts at the cost of limited model interpretability and predictability.

Autoregressive models for language generation

In autoregressive models for language generation, the task of language generation is defined as the problem of predicting a token, given the previously predicted tokens. As we saw earlier, this formulation can be generalized to time series from other domains. In the “Speech Enhancement with GANs” chapter, we will revisit this formulation in the audio domain. This approach is very interesting because it fully models the dependencies between the tokens in the sentence or time series.

Mathematically speaking, in autoregressive models, language generation is modeled by learning the distribution of the sentence, xx, by factorizing it with the product of conditional probabilities:

Get hands-on with 1200+ tech skills courses.