...

/

Classical Approaches to Learning Word Representations

Classical Approaches to Learning Word Representations

Learn about the classical approaches to word representation.

Word representations

What is meant by the word “meaning”? This is more of a philosophical question than a technical one. So, we won’t try to discern the best answer for this question but accept a more modest answer: meaning is the idea conveyed by or some representation associated with a word. For example, when we hear the word “cat,” we conjure up a mental picture of something that meows, has four legs, has a tail, and so on; then, if we hear the word “dog,” we again formulate a mental image of something that barks, has a bigger body than a cat, has four legs, has a tail, and so on. In this new space (that is, the mental pictures), it’s easier for us to understand that cats and dogs are similar than by just looking at the words.

Since the primary objective of NLP is to achieve human-like performance in linguistic tasks, it’s sensible to explore principled ways of representing words for machines. To achieve this, we’ll use algorithms that can analyze a given text corpus and come up with good numerical representations of words (that is, word embeddings) so that words that fall within similar contexts (for example, “one” and “two,” “I” and “we”) will have similar numerical representations compared to words that are unrelated (for example, “cat” and “volcano”).

Classical approaches to learning word representation

In this section, we’ll discuss some of the classical approaches used for numerically representing words. It’s important to have an understanding of the alternatives to word vectors because these methods are still used in the real world, especially when limited data is available.

More specifically, we’ll discuss common representations, such as one-hot encoding and term frequency-inverse document frequency (TF-IDF).

One-hot encoded representation

One of the simpler ways of representing words is to use the one-hot encoded representation. This means that if we have a vocabulary of size VV, for each ithi^{th} word wiw_i, we’ll represent the word wiw_i ...