Use Case: Using BERT to Answer Questions
Learn the Hugging Face transformers library and the dataset for answering questions with BERT.
Using BERT to answer questions
Now, let’s learn how to implement BERT, train it on a question-answer dataset, and ask the model to answer a given question.
Introduction to the Hugging Face transformers
library
We will use the transformers
library built by Hugging Face. The transformers
library is a high-level API that is built on top of TensorFlow, PyTorch, and JAX. It provides easy access to pretrained transformer models that can be downloaded and fine-tuned with ease. We can find models in the Hugging Face’s model registry. We can filter models by task, examine the underlying deep learning frameworks, and more.
The transformers
library was designed with the aim of providing a very low barrier for entry to using complex transformer models. For this reason, there are only a handful of concepts that we need to learn in order to get going with the library. Three important classes are required to load and use a model successfully:
Model class (such as
TFBertModel
): Contains the trained weights of the model in the form oftf.keras.models.Model
or the PyTorch equivalent.Configuration (such as
BertConfig
): Stores various parameters and hyperparameters needed to load the model. If we’re using the pretrained model as is, we don’t need to explicitly define its configuration.Tokenizer (such as
BertTokenizerFast
): Contains the vocabulary and token-to-ID mapping needed to tokenize the words for the model.
All of these classes can be used with two straightforward functions:
from_pretrained()
: Provides a way to instantiate a model/configuration/tokenizer available from the model repository or locally.save_pretrained()
: Provides a way to save the model/configuration/tokenizer so that it can be reloaded later.
Note: TensorFlow hosts a variety of transformer models (released by both TensorFlow and third parties) in TensorFlow Hub. Click the link if you would like to know how to use TensorFlow Hub and the raw TensorFlow API to implement a model such as BERT.
We’ll soon see how these classes and functions are used in an actual use case. It’s also important to note the side effects of having such an easy-to-grasp interface for using models. Due to serving the very specific purpose of providing a way to use transformer models built with TensorFlow, PyTorch, or Jax, we don’t have the modularity or flexibility found in TensorFlow, for example. In other words, we can’t use the transformers library in the same way we would use TensorFlow to build a tf.keras.models.Model
using tf.keras.layers.Layer
objects.
Exploring the data
The dataset we’re going to use for this task is a popular question-answering dataset called SQUAD. Each data point consists of four items:
A question
A context that may contain the answer to the question
The start index of the answer
The answer
We can download the dataset using Hugging Face’s datasets
library and call the load_dataset()
function with the "squad"
argument:
Get hands-on with 1400+ tech skills courses.