What is attention?

An overview of attention

Humans do not tend to pay attention sequentially. We don't typically start looking at a picture from a corner––our focus tends to go to the important features first. For example, when looking at a picture, we often look at the objects in the middle. Humans pay attention in two ways, that is, conscious attention and unconscious attention.

We would like our neural networks to model the behavior of conscious attention so they can pay attention to the important parts of the input first. This is done by attention, more commonly known as the attention mechanism. It is like blurring out the less important features.

How it works

In the simplest terms, the attention mechanism works by calculating the weighted sum. This weighted sum tells the model about the importance of features. The process of calculating these weights is crucial and may vary for different applications. Here, we explain the attention mechanism in a general sense which is the same for most of the applications.

The working of the attention mechanism is divided into two parts:

Calculating the attention distribution from the input.
Calculating the context vector using the attention distribution.

The attention distribution is what gives weightage to different inputs. Before calculating this, we first encode our inputs using a neural network. This encoded representation of the input is called keys ( $K$ ).

Another thing used to calculate attention distribution is a query ( $q$ ). It is a task-related representation that can be a vector or a matrix depending upon the task. A neural network computes the correlation between the keys and the query by using the score function $f$ . The score function explains how keys and queries are related by giving us an energy score.

Applications

The attention mechanism is one of the most important concepts in deep learning. It has several applications. Some of them are given below:

Image caption generation
Image-based analysis
Action recognition
Text classification
Machine translation
Speech recognition
Recommendation systems

Models that use attention mechanism

Attention is used extensively in Natural Language Processing (NLP). However, it is not limited to NLP, it is also used in computer vision. Some famous models that use the attention mechanism are as follows:

GPT-3: Generative Pretrained Transformers version 3, GPT-3, is the transformer-based model that is trained to not only generate text, but to summarize large documents. It is also capable of responding to conversational tasks. Another surprising feature is that it can write code from comments.
The show, attend, and tell model: This model was trained to generate captions for images.
ViT: Vision Transformers, were introduced by Google's research team. This model outperformed the conventional ResNets for computer vision tasks.

Free Resources

Learn in-demand tech skills in half the time

PRODUCTS

Mock Interview

New

Courses

Skill Paths

Projects

Assessments

Score Function	Equation
Additive	f(q, k) = v^Tact(W₁k + W₂q + b)
Multiplicative	f(q, k) = q^Tk
General	f(q, k) = q^TWk
Location-based	f(q, k) = f(q)