Biased Vectors

Learn about biases included in word vectors

We'll cover the following

Biases in word vectors

Relationships between word vectors are a crucial reason for their high demand. Embeddings created from large corpora capture the intricate relationships between words and concepts. This allows downstream models to utilize this knowledge, achieving better results. Let’s delve into how this works and explore potential pitfalls.

We’ve already noted that related words tend to produce similar vectors. Owing to this characteristic, models can generalize knowledge more effectively. Consider the following example: “Amy is happy because of her new car” is a training set instance for a sentiment classification model. If we replace “car” with “van,” the vector value for “van” will be similar to that of “car.” Thus, the model’s prediction should be reasonably consistent. As a result, there’s no need to include multiple synonyms for “car” in the training set. The model recognizes that some yet-to-be-seen words are similar and can be predicted correctly.

Get hands-on with 1200+ tech skills courses.