Introduction: Using Transformers to Generate Text
Get an overview of the topics that will be covered in this chapter.
The NLP domain has seen some remarkable leaps in the way we understand, represent, and process textual data. From handling long-range dependencies/sequences using LSTMs and GRUs to building dense vector representations using word2vec and friends, the field, in general, has seen drastic improvements. With word embeddings becoming almost the de facto representation method and LSTMs serving as the workhorse for NLP tasks, we encountered some roadblocks in terms of further enhancement. This setup of using embeddings with LSTM made the best use of encoder-decoder (and related architectures) style models.
We saw briefly in the previous chapter how certain improvements were achieved due to the research and application of CNN-based architectures for NLP use cases. In this chapter, we’ll touch upon the next set of enhancements that led to the development of current state-of-the-art transformer architectures. We’ll focus on:
An overview of attention and how transformers changed the NLP landscape.
The GPT series of models, with a step-by-step guide to preparing a text-generation pipeline based on GPT-2.
We’ll cover topics such as attention, self-attention, contextual embeddings, and, finally, transformer architectures.
Get hands-on with 1400+ tech skills courses.