The Skip-Gram Algorithm

Learn about the skip-gram Word2vec algorithm.

The first algorithm we’ll talk about is known as the skip-gram algorithm, which is a type of Word2vec algorithm. As we have discussed in numerous places, the meaning of a word can be elicited from the contextual words surrounding it. However, it isn’t entirely straightforward to develop a model that exploits this way of learning word meanings. The skip-gram algorithm, introduced by Mikolov et al. in 2013, is an algorithm that exploits the context of the words in a written text to learn good word embeddings.

Let’s go through the skip-gram algorithm step by step. First, we’ll discuss the data preparation process. Understanding the format of the data puts us in a great position to understand the algorithm. We’ll then discuss the algorithm itself. Finally, we’ll implement the algorithm using TensorFlow.

From raw text to semistructured text

First, we need to design a mechanism to extract a dataset that can be fed to our learning model. Such a dataset should be a set of tuples of the format (target, context). Moreover, this needs to be created in an unsupervised manner. That is, a human should not have to manually engineer the labels for the data. In summary, the data preparation process should do the following:

  • Capture the surrounding words of a given word (that is, the context).
  • Run in an unsupervised manner.

The skip-gram model uses the following approach to design a dataset:

  • For a given word wiw_i, a context window size of mm is assumed. By “context window size,” we mean the number of words considered as context on a single side. Therefore, for wiw_i, the context window (including the target word wiw_i) will be of size 2m+12m+1 and will look like this: [wim,...,wi1,wi,wi+1,...,wi+m].[w_{i-m}, ..., w_{i-1}, w_i, w_{i+1}, ..., w_{i+m}].

  • Next, (target, context) tuples are formed as [...,(wi,wim),...,(wi,wi1),(wi,wi+1),...,(wi,wi+m),...];[..., (w_i, w_{i-m}), ..., (w_i, w_{i-1}), (w_i, w_{i+1}), ..., (w_i, w_{i+m}),...]; ...