Teacher-Student Architecture

Learn about the teacher-student architecture of TinyBERT and how distillation happens in the TinyBERT model.

In TinyBERT, we use a two-stage learning framework where we apply distillation in both the pre-training and fine-tuning stage.


But to understand how exactly TinyBERT works, let's first go over the premise and notation used. The following figure shows the teacher and student BERT:

Get hands-on with 1400+ tech skills courses.