Pre-Training Strategies for the BERT Model

Learn about the different pre-training strategies used to train the BERT model.

Now that we've learned how to feed the input to BERT by converting it into embeddings and also how to tokenize the input using a WordPiece tokenizer, let's learn how to pre-train the BERT model.

Pre-training strategies

The BERT model is pre-trained on the following two tasks:

  • Masked language modeling

  • Next sentence prediction

Let's understand how the two aforementioned pre-training strategies work by looking at each in turn. Before diving directly into the masked language modeling task, first, let's understand how a language modeling task works.

Get hands-on with 1300+ tech skills courses.