...
/Classical Approaches to Learning Word Representations
Classical Approaches to Learning Word Representations
Learn about the classical approaches to word representation.
Word representations
What is meant by the word “meaning”? This is more of a philosophical question than a technical one. So, we won’t try to discern the best answer for this question but accept a more modest answer: meaning is the idea conveyed by or some representation associated with a word. For example, when we hear the word “cat,” we conjure up a mental picture of something that meows, has four legs, has a tail, and so on; then, if we hear the word “dog,” we again formulate a mental image of something that barks, has a bigger body than a cat, has four legs, has a tail, and so on. In this new space (that is, the mental pictures), it’s easier for us to understand that cats and dogs are similar than by just looking at the words.
Since the primary objective of NLP is to achieve human-like performance in linguistic tasks, it’s sensible to explore principled ways of representing words for machines. To achieve this, we’ll use algorithms that can analyze a given text corpus and come up with good numerical representations of words (that is, word embeddings) so that words that fall within similar contexts (for example, “one” and “two,” “I” and “we”) will have similar numerical representations compared to words that are unrelated (for example, “cat” and “volcano”).
Classical approaches to learning word representation
In this section, we’ll discuss some of the classical approaches used for numerically representing words. It’s important to have an understanding of the alternatives to word vectors because these methods are still used in the real world, especially when limited data is available.
More specifically, we’ll discuss common representations, such as one-hot encoding and term frequency-inverse document frequency (TF-IDF).
One-hot encoded representation
One of the simpler ways of representing words is to use the one-hot encoded representation. This means that if we have a vocabulary of size , for each word , we’ll represent the word ...