Introduction to Pretext Tasks
Learn about self-supervised learning and pretext tasks.
We'll cover the following
Pretext tasks
Using pretext tasks is one of the popular and earliest ways to train a model using self-supervision. Generally, a pretext task is defined beforehand, and pseudo labels
Formally, given a source unlabeled dataset
Here,
Here,
Once the model is trained, it can be used in any downstream tasks such as classification, segmentation, detection, etc. The figure below shows the self-supervised pretext task training.
In summary, during self-supervised learning:
Pretext tasks are defined and pseudo labels are generated for each training sample.
The model is trained to predict these pseudo labels, given the input.
Once trained, features of the trained model are transferred to the downstream task, where only a small amount of labeled data is available.
Here are some examples of popular pretext tasks used in self-supervised learning literature.
Relative positioning involves predicting the relative spatial arrangement between two image patches.
Solving jigsaw puzzles involves predicting the permutation of a shuffled image.
Image rotation involves predicting the rotation angle of the image.
Designing pretext tasks
When training neural networks with pretext task-based self-supervised learning objectives, we assume that the distribution or nature of our actual transfer task is similar to the pretext task we're solving. This assumption leads to another assumption that solving the pretext task will help solve the transfer tasks very well.
However, there's a reasonably significant mismatch between what is being solved in the pretext task and what we need to achieve by the transfer tasks. Hence, pretext task-based pre-training is only sometimes suitable for all downstream jobs.
To validate this, we can experiment to determine which layer’s features of the neural network yield the best performance on a transfer task (using single linear classifiers). If the version of the last layer representations is not the best, we can say that the pretext task is not well-aligned with the downstream transfer task and might not be the right task to solve.
To understand this, let’s see an example. First, we train a ResNet in a self-supervised manner to solve jigsaw puzzles that involve asking the neural network to predict the permutation of a shuffled jigsaw image. Then, we plot the Mean Average Precision (mAP) on the y-axis when each layer representation of ResNet is transferred to the PASCAL Visual Object Classes dataset. As shown in the figure below, the last layer representations of the ResNet become so specialized for the jigsaw problem that they don’t generalize well to the downstream classification task on the PASCAL Visual Object Classes dataset.
Therefore, we must carefully choose our self-supervised pre-training tasks to align well with the downstream transfer tasks.