Implementing GloVe

Learn about the implementation of the GloVe word embedding algorithm.

We'll cover the following...

In this lesson, we’ll discuss the steps for implementing GloVe.

First, we’ll define the hyperparameters:

batch_size = 4096 # Data points in a single batch
embedding_size = 128 # Dimension of the embedding vector.
window_size=1 # We use a window size of 1 on either side of target word
epochs = 5 # Number of epochs to train for
# We pick a random validation set to sample nearest neighbors
valid_size = 16 # Random set of words to evaluate similarity on.
# We sample valid data points randomly from a large window without always being deterministic
valid_window = 250
# When selecting valid examples, we select some of the most frequent words as well as
# some moderately rare words
np.random.seed(54321)
random.seed(54321)
valid_term_ids = np.array(random.sample(range(valid_window), valid_size))
valid_term_ids = np.append(
valid_term_ids, random.sample(range(1000, 1000+valid_window), valid_size),
axis=0
)

The hyperparameters we define here are the same hyperparameters that we already discussed. We have a batch size, embedding size, window size, the number of epochs, and, finally, a set of held-out validation word IDs to which we’ll print the most similar words.

We’ll then define the model. First, we’ll import a few things we’ll need down the line:

import tensorflow.keras.backend as K
from tensorflow.keras.layers import Input, Embedding, Dot, Add
from tensorflow.keras.models import Model
K.clear_session()

The model is going to have two input layers: word_i and word_j. They represent a batch of context words and a batch of target words (or a batch of positive skip-grams):

# Define two input layers for context and target words
word_i = Input(shape=())
word_j = Input(shape=())

Note how the shape is defined as an empty tuple. This means the final shape of word_i and word_j would be [None], meaning it will take a vector of an arbitrary number of elements as the input.

Next, we’re going to define the embedding layers. There will be four embedding layers:

...