Candidate Sampling

Understand why candidate sampling is used for embedding training.

Chapter Goals:

  • Learn about candidate sampling and why it is useful for embedding training

A. Large vocabularies

To obtain good word embeddings, it is usually necessary to train an embedding model on a large amount of text data. This means that the vocabulary size will likely be very large, often reaching tens of thousands of words. However, having a large vocabulary size can significantly slow down training.

Training an embedding model is ...