Local vs. Global Attention
Explore the distinctions between global and local attention mechanisms, uncovering the efficiency and dynamic nature of local attention.
We've previously explored global attention mechanisms, which establish connections across all inputs—be they spatial, channel-related, or temporal. Now, let's explore another critical aspect: local attention.
Local attention mechanism
As known, convolution is a local operation, due to its inductive bias or modeling assumption, while attention was identified as global, devoid of modeling assumptions, or low in inductive bias. Spatial attention, as depicted, links each blue pixel in space to a red pixel, capturing their relationship through an attention map. This is known as non-local attention, although other options are available.
The matrix depicted in the above illustration represents the attention distribution within a spatial context. Each element in the matrix corresponds to a position in the input space, and the intensity of the connections between elements is visually represented by the color scale.
The gray matrix in the lower middle signifies a non-local attention pattern. Unlike local operations such as convolution, where interactions are confined to a specific neighborhood, non-local attention allows each position in the input space to contribute to the attention mechanism without restrictions.
Non-local attention
In the world of self-attention mechanisms, two fundamental design approaches emerge, global self-attention and local self-attention.
Global self-attention, as the name implies, operates without constraints imposed by input feature size. It encompasses the entire feature map, allowing each position to attend to every other position within the map. ...