...

/

The WordPiece Tokenizer

The WordPiece Tokenizer

Learn about the WordPiece tokenizer and how it works.

BERT uses a special type of tokenizer called a WordPiece tokenizer. The WordPiece tokenizer follows the subword tokenization scheme. Let's understand how the WordPiece tokenizer works with the help of an example. Consider the following sentence:

Tokenize the sentence

Now, if we tokenize the sentence using the WordPiece tokenizer, then we obtain the tokens as shown here:

We can observe that while tokenizing the sentence using the WordPiece tokenizer, the word 'pretraining' is split into the ...