How the transformer model is used for text generation?

Key takeaways:

  • Transformers replace RNNs/LSTMs for NLP with efficient multi-head attention. They use encoders (process input) and decoders (generate output).

  • Transformers pretrained models like BART-large enable easy text generation.

  • Tokenization converts input into numerical IDs for processing and beam search improves output quality by exploring multiple options.

  • Generated tokens are decoded into clean, readable text.

  • Transformers are scalable, efficient, and versatile across ML tasks.

The transformer model is a type of deep learning neural network that is used as an efficient replacement of recurrent neural network (RNN) and long short-term memory (LSTM) for various natural language processing (NLP) tasks. It was developed by Google and proposed in the groundbreaking paper “Attention Is All You Need” in 2017 based on the multi-head attention mechanism. It is designed in a way to handle the sequential data more efficiently as compared to the previous modules.

We’ll see how a transformer model helps to generate text from input data using a pretrained model.

Workflow

Let’s understand how the transformer model works. It has two main components: an encoder and a decoder. The encoder processes the input data and passes information about the representation of the input data to the decoder. The decoder receives the representation sent by the encoder and generates the output sentence in the sequence to generate the text.

Workflow of a transformer model
Workflow of a transformer model

Using a transformer model for text generation

Suppose we have input data “It was a dark and stormy night...” and we want to generate the text from this input through a transformer model. Let’s go through the steps one by one for better understanding.

We’ll use the pretrained BART-large model from the Hugging Face Transformers library.

Step 1: Import libraries

We need to import the required Python libraries and modules for text generation. These include the BartTokenizer and BartForConditionalGeneration from the transformers library, as well as logging, os, and warnings for environment configuration and debugging.

from transformers import BartTokenizer, BartForConditionalGeneration
import logging
import os
import warnings
warnings.filterwarnings("ignore", category=FutureWarning)

Step 2: Configure logging and environment

To reduce unnecessary logs and keep the output clean, configure the logging level for the transformers library and suppress TensorFlow warnings.

logging.getLogger('transformers').setLevel(logging.ERROR)
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'

Step 3: Load the tokenizer and model

Load the pretrained BART-large CNN model and tokenizer.

  • Tokenizer: This splits input text into tokens and converts them into numerical IDs.

  • Model: The transformer model is responsible for generating text.

tokenizer = BartTokenizer.from_pretrained('facebook/bart-large-cnn')
model = BartForConditionalGeneration.from_pretrained('facebook/bart-large-cnn')

Step 4: Tokenize input data

Define the input text that will be used as a prompt for text generation.

inputText = """
It was a dark and stormy night...
"""

Step 5: Tokenize input data

Tokenize the input text using the tokenizer. This step prepares the text for the model by converting it into numerical representations (input IDs).

  • Parameters:

    • return_tensors='pt': Returns the tokenized data as PyTorch tensors.

    • max_length=1024: Limits the input length to 1024 tokens.

    • truncation=True: Truncates text longer than the maximum length.

inputs = tokenizer(inputText, return_tensors='pt', max_length=1024, truncation=True)

Step 6: Generate text

Use the model to generate a continuation of the input text.

  • Key parameters:

    • inputs['input_ids']: The tokenized input text.

    • num_beams=4: Beam search to improve output quality by exploring multiple generation paths.

    • max_length=100: Limits the length of the generated output.

    • early_stopping=True: Stops generation when the end of the sentence is detected.

summary_ids = model.generate(
inputs['input_ids'],
num_beams=4,
max_length=100,
early_stopping=True
)

Step 7: Decode and display output

Convert the generated token IDs back to readable text using the tokenizer.

  • Parameters:

    • skip_special_tokens=True: Removes special tokens like <pad> and <eos>.

    • clean_up_tokenization_spaces=True: Cleans up extra spaces in the generated text.

summary = tokenizer.decode(summary_ids[0], skip_special_tokens=True, clean_up_tokenization_spaces=True)
print(summary)

Final output

The generated text will be printed as a continuation of the input. For example:

The wind howled through the trees, and rain lashed against the windows of an old house standing in solitude...

This step-by-step process demonstrates how to use a pretrained transformer model for text generation efficiently.

Complete code

The code example below demonstrates the use of BART for conditional text generation, which is commonly used for tasks like text summarization. The model facebook/bart-large-cnn is specifically trained for summarization tasks, which is why it is ideal for generating concise summaries from longer text inputs.

Please note that the notebook cells have been preconfigured to display the outputs
for your convenience and to facilitate an understanding of the concepts covered. 
Text generation using transformer model

In conclusion, the transformer model represents a prominent advancement in natural language processing because of its scalability, efficiency, and versatility. Developers can leverage this state-of-the-art deep neural architecture not only for NLP tasks but also in other areas of machine learning.

Frequently asked questions

Haven’t found what you were looking for? Contact Us


Can transformers be used for text classification?

Yes, transformers are widely used for text classification due to their ability to capture context and relationships within text using self-attention mechanisms.


Which model is used for text classification?

Pretrained models like BERT, RoBERTa, and DistilBERT are commonly used for text classification tasks.


Which model is best for NLP text classification?

BERT and RoBERTa are highly effective for NLP text classification due to their strong contextual understanding and fine-tuning capabilities.


Free Resources

Copyright ©2025 Educative, Inc. All rights reserved