Barlow Twins
Learn about redundancy reduction and a widely used redundancy reduction algorithm, Barlow Twins.
We'll cover the following...
Redundancy reduction
Large neural networks suffer from the problem of redundant representations. Some of the final representations of the input are trivial constants. Such redundant representations are often detrimental to the model's performance as they hinder the neural network from extracting relevant information from the input.
From a biological perspective, brain neurons communicate with each other by sending electrical signals in the form of spikes. In his efficient coding hypothesis, Horace Barlow (December 8, 1921–July 5, 2020), a British vision scientist, hypothesized that sensory information present in spikes in the sensory system is efficiently represented in the form of neural code that minimizes the number of spikes needed to transmit a given signal.
For example, as shown in the figure above, the first (“This is a bear”), second (“Clearly not a dog!”), and third (“Four legs and a tail”) features are redundant. They don’t convey any additional information. Instead, they create confusion, which can harm the model's performance.
A self-supervised algorithm should learn representations that are invariant to input distortions and, at the same time, avoid learning constant, trivial, or redundant solutions. Thus, avoiding such redundant information from the network representations is important to ensure efficient learning.
We will now discuss a popular self-supervision algorithm known as Barlow Twins, which is based on the concept of redundancy reduction. The Barlow Twins method avoids redundant information by bringing the cross-correlation matrix between the representations of two identical networks (each fed with distorted views of a sample) close to the identity matrix.
Barlow Twins architecture
Barlow’s redundancy-reduction principle is applied to a pair of identical networks that share the same set of weights. It brings the cross-correlation matrix between the representations of two identical networks (each fed with distorted views of a sample) close to the identity matrix.
This makes the feature representations of the distorted image samples similar while reducing the redundant components in the learned representations. The figure below illustrates the concept of the Barlow Twins.
More specifically, given an image batch,