TinyBERT
Learn about the TinyBERT variant of BERT based on knowledge distillation.
We'll cover the following...
TinyBERT is another interesting variant of BERT that also uses knowledge distillation. With DistilBERT, we learned how to transfer knowledge from the output layer of the teacher BERT to the student BERT. But apart from this, can we also transfer knowledge from other layers of the teacher BERT? Yes!
In TinyBERT, apart from transferring knowledge from the output layer (prediction layer) of the teacher to the student, we also transfer knowledge from embedding and encoder layers. ...