Introduction to Similarity Maximization

Learn about similarity maximization–based self-supervised learning.

We previously discussed how pretext task-based self-supervised pre-training is unsuitable for all downstream tasks. However, we also understood why pretext task-based pre-training is only sometimes suitable for all downstream jobs (because of the mismatch between what is being solved in the pretext task and what we need to be achieved by the transfer task). In this chapter, we will learn similarity maximization, a popular and commonly used self-supervised paradigm that addresses the limitations of pretext task-based self-supervised learning.

What do we want from pre-trained features?

Fundamentally, after the pre-training step, we want the trained features to satisfy two important properties:

  • Capture semantics: We want them to represent how images relate to each other, such as whether a pair of images are similar and to what extent.

  • Robustness: We want them to be robust or invariant to “nuisance factors” like noise, data augmentation, occlusions, etc.

A trivial solution

Given a neural network, f(.)f(.), we want to learn features that are robust to data augmentation, that is, f(T1(Xi))=f(T2(Xi))f(T_1(X_i)) = f(T_2(X_i)). Here, Tj(.)T_j(.) is a data augmentation strategy, and XiX_i is the input image.

Create a free account to view this lesson.

By signing up, you agree to Educative's Terms of Service and Privacy Policy