The MoCo-v2 Algorithm

Learn about another widely used contrastive learning algorithm, MoCo-v2.

MoCo-v2 vs. SimCLR

The crux of contrastive learning is to use many negative samples to learn rich representations. The SimCLR algorithm, as we have seen, creates or picks these negatives within the same batch. Therefore, in order to accomplish a large number of negative samples, we need large batch sizes and significant computing power requirements. These requirements make the SimCLR framework computationally heavy.

Momentum Contrast V2 (MoCo-v2) uses a different approach to generate negatives, which is computationally efficient. As shown in the figure below, MoCo-v2 looks at contrastive learning slightly differently (i.e., as a query-key matching problem). Think of queries and keys as n-dimensional vectors. Matching a query and a key means aligning (maximizing the similarity between) the vectors they represent. Unlike SimCLR, MoCo-v2 uses two encoders—one for generating “queries” fquery(.)f^\text{query}(.) and the other for generating “keys” fkey(.)f^\text{key}(.).

In this context, aligning a positive pair will mean matching the query qiq_i with the key kik_i, where both come from the same image XiX_i. An encoded query, qiq_i, should be similar to its matching key, kik_i, and dissimilar to other keys kjik_{j \neq i}. The figure below illustrates the idea.

Get hands-on with 1400+ tech skills courses.