Teacher-Student Architecture
Learn about the teacher-student architecture of TinyBERT and how distillation happens in the TinyBERT model.
We'll cover the following
In TinyBERT, we use a two-stage learning framework where we apply distillation in both the pre-training and fine-tuning stage.
But to understand how exactly TinyBERT works, let's first go over the premise and notation used. The following figure shows the teacher and student BERT:
Get hands-on with 1400+ tech skills courses.