Masked Autoencoders: Decoder and Loss Function
Learn how to implement the decoder layer and loss function of Masked Autoencoders (MAE).
We'll cover the following
Decoder
The input to the MAE decoder consists of all the tokens—that is:
Encoded visible patches, and
Mask tokens.
Similar to SimMIM, a shared masked token vector is used as a substitute for the missing or masked patches in the input. The full set of tokens is passed through a transformer network containing self-attention layers.
The goal of the MAE decoder is to perform the image reconstruction task. Note that the MAE decoder is only used during the pre-training step (i.e., only the encoder is used in the transfer learning step). The design of the decoder can be flexible. You can opt for shallow decoders to incur minimum training overhead.
Get hands-on with 1400+ tech skills courses.