Introduction to Self-Supervised Learning
Learn about self-supervised learning and its mathematical framework and taxonomy.
What is self-supervised learning?
Self-supervised learning methods are a class of machine learning algorithms that learn rich neural network representations without relying on labels. These algorithms leverage the supervisory signals or pseudo labels from the structure of the unlabeled data and predict any unobserved or hidden property of the input.
For example, in computer vision, one can rotate an image by a certain degree and ask the neural network to predict the rotation angle of the picture. In this example, we didn’t use human-annotated labels to train the neural network. Instead, we defined our pseudo labels (i.e., the angle of rotation of an image), which serve as supervisory signals. After these supervisory signals or pseudo labels are created, we can use our standard supervised losses (e.g., cross-entropy) to train the neural network.
One might confuse self-supervised learning with unsupervised learning (a more known terminology). Though the assumptions about the absence of training labels are identical in both frameworks, unsupervised learning needs to be better defined. It's often misleading as it refers to learning without supervision. Self-supervised learning, on the other hand, is not unsupervised since it uses supervisory signals from the data structure. This difference is illustrated in the figure below.
Taxonomy of self-supervised learning
Self-Supervised Learning (SSL) algorithms are classified into four categories based on their objective functions, all of which we'll learn in this course. In addition, we can combine multiple classes of self-supervision algorithms to develop better algorithms, as we'll see in this course.
Self-supervised learning framework
Self-supervised learning aims to learn a neural network
The self-supervised learning framework consists of two steps: pre-training and transfer learning.
Pre-training step
The pre-training step involves training a neural network
As discussed in the previous lesson, the self-supervised learning objective will help the neural network learn rich-semantic representations by extracting the supervisory signals from the structure of the data itself. Mathematically, this step can be written as:
Here,
Transfer learning
Once the network is trained, its feature representations can be transferred on a downstream task using a small labeled target dataset
Linear classifier
Keeping the feature extractor
Here,
Here,
Fine-tuning
Fine-tuning is when weights of a trained neural network are used as initialization and optimized further (only for a few epochs and using a small learning rate) on a target downstream task (usually having small labeled samples). This is unlike regular training where we train the neural network from scratch on a huge number of data points.
By using the weights of feature extractor
Here,