A. Word representation

So far, the way we’ve represented vocabulary words is with unique integer IDs. However, these integer IDs don’t give a sense of how different words may be related. For example, if a vocabulary gave the words “milk” and “cereal” the IDs 14 and 103, respectively, we would have no idea that these two words are often used within the same context.

The solution to this problem is to convert each word into an embedding vector. An embedding vector is a higher-dimensional vector representation of a vocabulary word. Since vectors have a sense of distance (as they are just points in a higher-dimensional space), embedding vectors give us a word representation that captures relationships between words.

Below is an example of word embedding vectors from the TensorFlow Embedding Projector. Note that the words most similar to “sheep” ...

What you'll learn from this course

Word Embeddings

Language Model

Text Classification

Seq2Seq Model

Embeddings

Chapter Goals:

A. Word representation