CBOW vs skip-gram

CBOW and skip-gram are models of the Word2vec framework used in natural language processing. Word2Vec is a neural network model for word embeddings. Before diving into explaining what are embeddings, I have a question for you. How do we make machines understand text?

The core idea behind word embeddings is to convert text to numerical data (vector space) and capture the semantic as well as syntactic meaning of words and their relationships with other words in a corpus.

In neural network models like CBOW and skip-gram, the input layer is fed with the word embeddings. We generally use the one-hot encoding representation of the textual data to train the neural network models.

CBOW

The Continuous Bag of Words (CBOW) is a Word2Vec model that predicts a target word based on the surrounding context words. It takes a fixed-sized context window of words and tries to predict the target word in the middle of the window. The model learns by maximizing the probability of predicting the target word correctly given the context words.

Let’s take a look at a simple example.

Example

Let's say we have the sentence, "I eat Pizza on Friday". First, we will tokenize the sentence: ["I", "eat", "pizza","on", "Friday"]. Now, let's create the training examples for this sentence for the CBOW model, considering a window size of 2.

Training example 1: Input: ["I", "pizza"], Target: "eat".
Training example 2: Input: ["eat", "on"], Target: "pizza".
Training example 3: Input: ["pizza", "Friday"], Target:"on".

In CBOW, there are typically three main layers involved: the input layer, the hidden layer, and the output layer.

Skip-gram

Skip-gram is another neural network architecture used in Word2Vec that predicts the context of a word, given a target word. The input to the skip-gram model is a target word, while the output is a set of context words. The goal of the skip-gram model is to learn the probability distribution of the context words, given the target word.

During training, the skip-gram model is fed with a set of target words and their corresponding context words. The model learns to adjust the weights of the hidden layer to maximize the probability of predicting the correct context words, given the target word.

Let’s take a look at the same example discussed above.

Example

The sentence was, "I eat Pizza on Friday". First, we will tokenize the sentence: ["I", "eat", "pizza", "on", "Friday"]. Now, let's create the training examples for this sentence for the skip-gram model, considering a window size of 2.
Training example 1: Input: "eat", Target: ["I", "pizza"].
Training example 2: Input: "pizza", Target: ["eat", "on"].
Training example 3: Input: "on", Target: ["pizza", "Friday"].
Training example 4: Input: "Friday", Target: ["on"].

CBOW vs Skip-gram

	Skip-gram	CBOW
Architecture	Predicts context words given a target word	Predicts a target word given context words
Context Size	Handles large context windows (e.g., 5-20 words)	Handles smaller context windows (e.g., 2-5 words)
Training	Slower training time due to multiple target predictions	Faster training time due to single target prediction
Performance	Performs well with rare words and captures word diversity	Performs well with frequent words and captures word similarity
Word Vector	Dense word vectors with high dimensionality (100-300)	Dense word vectors with high dimensionality (100-300)
Model Size	Larger model size due to more parameters	Smaller model size due to fewer parameters

Note: The choice between skip-gram and CBOW depends on the specific task and dataset size. Generally, skip-gram is used when the dataset is large. Therefore, it performs well with rare words and captures word diversity.

Conclusion

In this answer, we discussed the CBOW and skip-gram models of the Word2Vec framework. Both CBOW and skip-gram provide different approaches to word embeddings, offering a trade-off between training efficiency, semantic capture, and the ability to handle different dataset characteristics. Therefore, choosing the appropriate algorithm among them requires careful consideration of the specific requirements of your task.

Free Resources

License: Creative Commons-Attribution NonCommercial-ShareAlike 4.0 (CC-BY-NC-SA 4.0)

Learn in-demand tech skills in half the time

PRODUCTS

Mock Interview

New

Courses

Skill Paths

Projects

Assessments