SimCLR Training Objective

Get introduced to SimCLR’s network architecture and its loss function.

Now that we have two augmented versions of the input batch, T1(B)T_1(B) and T2(B)T_2(B), we'll look into other components of the SimCLR training pipeline.

Network architecture

As shown in the figure below, the two augmented versions of an image, XiX_i (i.e., T1(Xi)T_1(X_i) and T2(Xi)T_2(X_i)), are passed through the neural network f(.)f(.) to get the penultimate feature representations, hi1h_{i1}, and hi2h_{i2}, respectively. These feature representations are passed again through a multilayer perceptron (MLP) projection head g(.)g(.) to get the feature embeddings zi1z_{i1} and zi2z_{i2}, respectively.

Get hands-on with 1400+ tech skills courses.