Natural Language Processing with TensorFlow/

...

/

Generating Data for GloVe

Generating Data for GloVe

Learn to generate data for GloVe embeddings.

We'll cover the following...

The glove_data_generator() function

The function takes several arguments:

sequences (List[List[int]]): This is a list of a list of word IDs. This is the output generated by the tokenizer’s texts_to_sequences() function.
window_size (int): This is the window size for the context.
batch_size (int): This is the batch size.
vocab_size (int): This is the vocabulary size.
cooccurrence_matrix (scipy.sparse.lil_matrix): This is a sparse matrix containing co-occurrences of words.
x_max (int): This is the hyperparameter used by GloVe to compute sample weights.
alpha (float): This is the hyperparameter used by GloVe to compute sample weights.
seed: This is the random seed.

It also has several outputs:

A batch of (target, context) word ID tuples.
The corresponding $log(X_{ij})$

...