Jigsaw puzzles
Similar to predicting the relative position of patches, this pretext task involves asking a neural network to solve jigsaw puzzles to develop a visuospatial representation of objects in the image. As shown in the figure below, the input image Xi is first split into a 3×3 grid, and all nine patches, Xi=[Xip1,Xip2,...Xip9], are shuffled based on a random permutation ,p, selected from a set of predefined permutations, P. This shuffled image permute(Xi, p) is passed through the neural network f(.) to predict which permutation from set P was applied to the image (i.e., f(permute(Xi, p))=p). Mathematically, the neural network performs a∣P∣- way classification problem and minimizes the following loss function: