Text to Text Transformer Models

Learn about text to text transformer models.

Google’s NLP technical revolution started with Vaswani et al. (2017), the original transformer, in 2017. “Attention is All You Need” toppled 30+ years of artificial intelligence belief in RNNs and CNNs applied to NLP tasks. It took us from the stone age of NLP/NLU to the 21st century in a long-overdue evolution.

The previous chapter summed up a second revolution that boiled up and erupted between Google’s Vaswani et al. (2017) original transformer and OpenAI’s Brown et al. (2020) GPT-3 transformers. The original transformer was focused on performance to prove that attention was all we needed for NLP/NLU tasks.

OpenAI’s second revolution, through GPT-3, focused on taking transformer models from fine-tuned pretrained models to few-shot trained models that required no fine-tuning. The second revolution was to show that a machine can learn a language and apply it to downstream tasks like we humans do.

It is essential to know about those two revolutions to understand what T5 models represent. The first revolution was an attention technique. The second revolution was teaching a machine to understand a language (NLU) and then letting it solve NLP problems as we do.

In 2019, Google was thinking along the same lines as OpenAI about how transformers could be perceived beyond technical considerations and take them to an abstract level of natural language understanding. These revolutions became disruptive. It was time to settle down, forget about source code and machine resources, and analyze transformers at a higher level.

Raffel et al. (2019) designed a conceptual text-to-text model and then implemented it. Let’s go through this representation of the second transformer revolution: abstract models.

The rise of text-to-text transformer models

Raffel et al. (2019) set out on a journey as pioneers with one goal: “Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer.” The Google team working on this approach emphasized that it would not modify the original transformer’s fundamental architecture from the start.

At that point, Raffel et al. (2019) wanted to focus on concepts, not techniques. Therefore, they showed no interest in producing the latest transformer model because we often see a so-called silver bullet transformer model with n parameters and layers. This time, the T5 team wanted to find out how good transformers could be at understanding a language.

Humans learn a language and then apply that knowledge to a wide range of NLP tasks through transfer learning. The core concept of a T5 model is to find an abstract model that can do things like us.

When we communicate, we always start with a sequence (A) followed by another sequence (B). B, in turn, becomes the start sequence leading to another sequence, as shown in the figure below:

Get hands-on with 1200+ tech skills courses.