Spatial vs. Channel vs. Temporal Attention
Understand the differences between spatial, channel, and temporal attention mechanisms used in vision transformers. Learn how each attention type processes image and video data to capture relationships across pixels, channels, and time frames. Gain practical insights including a basic implementation of spatial self-attention to enhance feature maps for computer vision tasks.
Let's discuss the differences between channel, spatial, and temporal attention mechanisms.
Spatial attention
When working with input feature maps of size
Self-attention feature map generation
Now, consider self-attention feature map generation. Starting with a 3D image tensor