Distillation of Embedding and Prediction Layer
Learn about the distillation of the embedding and prediction layer of Tiny BERT.
We'll cover the following...
Embedding layer distillation
In embedding layer distillation, we transfer knowledge from the embedding layer of the teacher to the embedding layer of the student. Let