Distillation: The BYOL Algorithm
Explore the BYOL algorithm, which applies distillation to maximize similarity between augmented views through teacher-student networks. Understand how asymmetric update rules and architecture avoid trivial solutions, and see practical augmentation strategies for self-supervised learning.
We'll cover the following...
We'll cover the following...
Distillation as similarity maximization
As shown in the figure below, distillation, in general, refers to transferring knowledge from a fixed (usually large) model known as teacher
Distillation methods can also be seen as similarity maximization–based methods. Just like contrastive learning and clustering, distillation aims to prevent trivial solutions to