A question-answering system extracts relevant answers from given context, used in chatbots, search engines, and virtual assistants.
Key takeaways:
Transformer models replace RNNs/LSTMs in NLP, using attention mechanisms for efficiency. They consist of an encoder for input processing and a decoder for output generation.
Transformer pretrained models like BERT answer questions by extracting answers from context.
For question-answering, input text is tokenized, structured with special tokens, and converted to IDs for processing. The model predicts start and end indices to extract and display answers.
Transformers are scalable, versatile, and essential for modern NLP tasks.
The Transformer model is a type of deep learning neural network that is used as an efficient replacement of recurrent neural network (RNN) and long short-term memory (LSTM) for various natural language processing (NLP) tasks. It was developed by Google and proposed in the groundbreaking paper "Attention Is All You Need" in 2017 based on the multi-head attention mechanism. It is designed in a way to handle the sequential data more efficiently as compared to the previous modules.
We’ll see how a transformer model helps to implement question-answers using a pretrained model.
Let’s understand how the transformer model works. It has two main components: an encoder and a decoder. The encoder processes the input data and passes information about the representation of the input data to the decoder. The decoder receives the representation sent by the encoder and generates the output sentence in the sequence to generate the answer.
Here’s the explanation of the code with steps:
Suppose we have a question and a relevant paragraph. We want to extract the answer from the paragraph using a Transformer model. Let’s go through the steps:
Import necessary Python libraries and modules needed for text processing and question answering.
import osimport torchimport loggingfrom transformers import BertForQuestionAnswering, BertTokenizerimport warningswarnings.filterwarnings("ignore", category=FutureWarning)
We use the pretrained bert-large-uncased-whole-word-masking-finetuned-squad
model, fine-tuned on SQuAD v1.1. This model is case-insensitive and trained to answer questions using a context. The tokenizer is used to process input text into tokens.
tokenizer = BertTokenizer.from_pretrained('bert-large-uncased-whole-word-masking-finetuned-squad')model = BertForQuestionAnswering.from_pretrained('bert-large-uncased-whole-word-masking-finetuned-squad')
Input the question and the paragraph from which the answer will be extracted.
Question = "What is the immune system?"paragraph = "The immune system is a system of many biological structures and processes within an organism that protects against disease. To function properly, an immune system must detect a wide variety of agents, known as pathogens, from viruses to parasitic worms, and distinguish them from the organism's own healthy tissue."
Special tokens are added to mark the beginning of the input ([CLS]
), separate the question and paragraph ([SEP]
), and end the input.
question = '[CLS] ' + Question + '[SEP]'paragraph = paragraph + '[SEP]'
The question and paragraph are tokenized into subwords, combined, and converted into numerical IDs that the model can process.
tokens_question = tokenizer.tokenize(question)tokens_paragraph = tokenizer.tokenize(paragraph)combined_tokens = tokens_question + tokens_paragraphtoken_ids = tokenizer.convert_tokens_to_ids(combined_tokens)
Create a list of segment IDs to differentiate tokens from the question (0) and the paragraph (1).
segment_id = [0] * len(tokens_question)segment_id += [1] * len(tokens_paragraph)
Convert token IDs and segment IDs into PyTorch tensors to prepare them for the model.
token_ids_tensor = torch.tensor([token_ids])segment_id_tensor = torch.tensor([segment_id])
Pass the tensors to the model to get start_logits
(score for the start of the answer) and end_logits
(score for the end of the answer).
objects = model(token_ids_tensor, token_type_ids = segment_id_tensor)starting_scores = objects.start_logitsending_scores = objects.end_logits
Find the indexes of the tokens with the highest scores, representing the start and end of the answer.
starting_index = torch.argmax(starting_scores)ending_index = torch.argmax(ending_scores)
Using the starting and ending indexes, extract and display the answer from the tokens.
print("Question: ", Question)print("Answer: ")print(' '.join(combined_tokens[starting_index:ending_index+1]))
When you run the code, the model processes the input and outputs:
Question: What is the immune system?Answer: the immune system is a system of many biological structures and processes within an organism that protects against disease .
This detailed breakdown explains the working of the Transformer model for question answering using the bert-large-uncased
pretrained model.
Here is the complete implementation of the steps we discussed above.
Please note that the notebook cells have been preconfigured to display the outputs for your convenience and to facilitate an understanding of the concepts covered. This hands-on approach will allow you to experiment with the memory techniques discussed, providing a more immersive learning experience.
In conclusion, the transformer model represents a prominent advancement in natural language processing because of its scalability, efficiency, and versatility. Developers can leverage this state-of-the-art deep neural architecture for NLP tasks and other areas of machine learning.
Haven’t found what you were looking for? Contact Us
Free Resources