Distillation: The BYOL Algorithm

Learn about self-supervised learning via distillation and get an overview of the BYOL algorithm.

Distillation as similarity maximization

As shown in the figure below, distillation, in general, refers to transferring knowledge from a fixed (usually large) model known as teacher fteacher(.)f^{\text{teacher}}(.) to a smaller one known as student fstudent(.)f^{\text{student}}(.).

Access this course and 1400+ top-rated courses and projects.