Embedding Words
Let's see how we can transform words into word vectors.
We're ready to transform words into word vectors. Embedding words into vectors happens via an embedding table. An embedding table is basically a lookup table. Each row holds the word vector of a word. We index the rows by word-IDs, hence the flow of obtaining a word's word vector is as follows:
word->word-ID: Previously, we obtained a word-ID for each word with Keras'
Tokenizer
.Tokenizer
holds all the vocabulary and maps each vocabulary word to an ID, which is an integer.word-ID->word vector: A word-ID is an integer and therefore can be used as an index to the embedding table's rows. Each word-ID corresponds to one row, and when we want to get a word's word vector, we first obtain its word-ID and then do a lookup in the embedding table rows with this word-ID.
The following diagram shows how embedding words into word vectors works:
Remember that in the previous lesson, we started with a list of sentences. Then we did the following:
We broke each sentence into words and built a vocabulary with Keras'
Tokenizer
.The
Tokenizer
object held a word index, which was a word->word-ID mapping.After obtaining the word-ID, we could do a lookup to the embedding table rows with this word-ID and got a word vector.
Finally, we fed this word vector to the neural network.
Training a neural network is not easy. We have to take several steps to transform sentences into vectors. After these preliminary steps, we're ready to design the neural network architecture and do the model training.