Mastering Self-Supervised Algorithms for Learning without Labels/

...

Masked Autoencoders: Masking and Encoder

Learn how to implement the masking strategy and encoder layer of Masked Autoencoders (MAE).

We'll cover the following...

Masking strategy
Encoder

Press + to interact

Python 3.8

import torch
import numpy as np
from PIL import Image
class MaskGenerator:
    def __init__(self, input_size=192, mask_patch_size=16, mask_ratio=0.8):
        self.input_size = input_size
        self.mask_patch_size = mask_patch_size
        self.mask_ratio = mask_ratio
        
        assert self.input_size % self.mask_patch_size == 0
        
        self.rand_size = self.input_size // self.mask_patch_size
        
        self.token_count = self.rand_size ** 2
        self.mask_count = int(np.ceil(self.token_count * self.mask_ratio))
        
    def __call__(self, images):
        mask_ids, ids_keeps, masks = [], [], []
        
        for image in images:
            permutation = np.random.permutation(self.token_count)
            mask_idx, ids_keep = permutation[:self.mask_count], permutation[self.mask_count:]
            mask = np.ones(self.token_count, dtype=int)
            mask[mask_idx] = 0
            
            mask = mask.reshape((self.rand_size, self.rand_size))
            mask = mask.repeat(self.mask_patch_size, axis=0).repeat(self.mask_patch_size, axis=1)
            
            mask_ids.append(mask_idx)
            ids_keeps.append(ids_keep)
            masks.append(mask)
        
        return mask_ids, ids_keeps, masks
images = [Image.open("n02107683_Bernese_mountain_dog.jpeg").resize((224,224)),
         Image.open("cat.jpg").resize((224, 224))]   
images[0].save("./output/image.png")
images = np.stack([np.array(image) for image in images], 0)
generator = MaskGenerator(input_size=224)
mask_ids, ids_keeps, masks = generator(images)
mask = Image.fromarray((255*masks[0]).astype(np.uint8))
mask.save("./output/mask.png")
masked_image = np.expand_dims(masks[0], -1)*images[0]
masked_image = Image.fromarray(masked_image.astype(np.uint8))
masked_image.save("./output/masked_image.png")

Line 5: We define a class, MaskGenerator, which takes a set of images and generates random masks for them.
Lines 7–16: We implement the __init__ function that takes the image’s input_size, mask_patch_size, and masking ratio mask_ratio as input parameters and calculates the total number of patches self.token_count and the number of patches to mask self.mask_count.
Lines 18–29: We implement the __call__ function, which takes a set of images as input and generates masks.
Lines 22–25: We first sample the random patch indexes, mask_idx, which will be masked. The rest of the patch ...

Introduction to Self-Supervised Learning

Pretext Tasks

Similarity Maximization and Redundancy Reduction

Masked Image Modeling

Appendix

Masked Autoencoders: Masking and Encoder

Masking strategy