Use Case: Using BERT to Answer Questions

Learn the Hugging Face transformers library and the dataset for answering questions with BERT.

Using BERT to answer questions

Now, let’s learn how to implement BERT, train it on a question-answer dataset, and ask the model to answer a given question.

Introduction to the Hugging Face transformers library

We will use the transformers library built by Hugging Face. The transformers library is a high-level API that is built on top of TensorFlow, PyTorch, and JAX. It provides easy access to pretrained transformer models that can be downloaded and fine-tuned with ease. We can find models in the Hugging Face’s model registry. We can filter models by task, examine the underlying deep learning frameworks, and more.

The transformers library was designed with the aim of providing a very low barrier for entry to using complex transformer models. For this reason, there are only a handful of concepts that we need to learn in order to get going with the library. Three important classes are required to load and use a model successfully:

  • Model class (such as TFBertModel): Contains the trained weights of the model in the form of tf.keras.models.Model or the PyTorch equivalent.

  • Configuration (such as BertConfig): Stores various parameters and hyperparameters needed to load the model. If we’re using the pretrained model as is, we don’t need to explicitly define its configuration.

  • Tokenizer (such as BertTokenizerFast): Contains the vocabulary and token-to-ID mapping needed to tokenize the words for the model.

All of these classes can be used with two straightforward functions:

  • from_pretrained(): Provides a way to instantiate a model/configuration/tokenizer available from the model repository or locally.

  • save_pretrained(): Provides a way to save the model/configuration/tokenizer so that it can be reloaded later.

Note: TensorFlow hosts a variety of transformer models (released by both TensorFlow and third parties) in TensorFlow Hub. Click the link if you would like to know how to use TensorFlow Hub and the raw TensorFlow API to implement a model such as BERT.

We’ll soon see how these classes and functions are used in an actual use case. It’s also important to note the side effects of having such an easy-to-grasp interface for using models. Due to serving the very specific purpose of providing a way to use transformer models built with TensorFlow, PyTorch, or Jax, we don’t have the modularity or flexibility found in TensorFlow, for example. In other words, we can’t use the transformers library in the same way we would use TensorFlow to build a tf.keras.models.Model using tf.keras.layers.Layer objects.

Exploring the data

The dataset we’re going to use for this task is a popular question-answering dataset called SQUAD. Each data point consists of four items:

  • A question

  • A context that may contain the answer to the question

  • The start index of the answer

  • The answer

We can download the dataset using Hugging Face’s datasets library and call the load_dataset() function with the "squad" argument:

Get hands-on with 1400+ tech skills courses.