Using BBPE as a tokenizer

We know that BERT uses the WordPiece tokenizer. The WordPiece tokenizer works similar to BPE, and it merges the symbol pair based on likelihood instead of frequency. Unlike BERT, RoBERTa uses BBPE as a tokenizer.

The BBPE works very similar to BPE, but instead of using a character-level sequence, it uses a byte-level sequence. We know that BERT uses a vocabulary size of 30,000 tokens, but RoBERTa uses a vocabulary size of 50,000 tokens. Let's explore the RoBERTa tokenizer further.

Import the necessary modules

First, let's import the necessary modules:

Access this course and 1400+ top-rated courses and projects.

Preview Free Lessons→

Preview Free Lessons

Before We Start

Starting Off with BERT

A Primer on Transformers

Understanding the BERT Model

Getting Hands-On with BERT

Exploring BERT Variants

Different BERT Variants

BERT Variants—Based on Knowledge Distillation

Applications of BERT

Exploring BERTSUM for Text Summarization

Applying BERT to Other Languages

Exploring Sentence and Domain-Specific BERT

Working with VideoBERT, BART, and More

Conclusion

RoBERTa Tokenizer

Using BBPE as a tokenizer

Import the necessary modules