Self-Attention vs. Convolution
Explore the differences between self-attention and convolutional methods in computer vision. Learn how self-attention projects image data into key, query, and value vectors to capture global relationships across patches. Understand the computational costs of attention maps and the role of multihead attention in extracting diverse features. Gain insights into self-attention’s dynamic weights versus convolution’s fixed filters, enhancing your grasp of transformer applications in visual tasks.
Let's explore how we can employ self-attention in computer vision.
Comparing self-attention and convolution in computer vision
The process of generating self-attended feature maps involves a series of transformations applied to a 3D image representation, denoted as
The first step involves the extraction of three sets of weight matrices, namely
Imagine the initial 3D image representation