Introduction: Pretraining a RoBERTa Model from Scratch
Get an overview of what we will cover this chapter.
We'll cover the following
Chapter overview
In this chapter, we will build a
We will use the knowledge of transformers we’ve acquired so far in this course to build a model that can perform language modeling on masked tokens step by step.
This chapter will focus on building a pretrained transformer model from scratch using a Jupyter notebook based on Hugging Face’s seamless modules. The model is named KantaiBERT.
KantaiBERT first loads a compilation of Immanuel Kant’s books created for this chapter. We will see how the data was obtained. We will also see how to create our own datasets for this notebook. KantaiBERT trains its own tokenizer from scratch. It will build its merge and vocabulary files, which will be used during the pretraining process. KantaiBERT then processes the dataset, initializes a trainer, and trains the model. Finally, KantaiBERT uses the trained model to perform an experimental downstream language modeling task and fills a mask using Immanuel Kant’s logic.
By the end of the chapter, we will know how to build a transformer model from scratch. We will have enough knowledge of transformers to face the Industry 4.0 challenge of using powerful pretrained transformers such as GPT-3 engines that require more than development skills to implement them.
This chapter covers the following topics:
Get hands-on with 1400+ tech skills courses.