Word representations

What is meant by the word “meaning”? This is more of a philosophical question than a technical one. So, we won’t try to discern the best answer for this question but accept a more modest answer: meaning is the idea conveyed by or some representation associated with a word. For example, when we hear the word “cat,” we conjure up a mental picture of something that meows, has four legs, has a tail, and so on; then, if we hear the word “dog,” we again formulate a mental image of something that barks, has a bigger body than a cat, has four legs, has a tail, and so on. In this new space (that is, the mental pictures), it’s easier for us to understand that cats and dogs are similar than by just looking at the words.

Since the primary objective of NLP is to achieve human-like performance in linguistic tasks, it’s sensible to explore principled ways of representing words for machines. To achieve this, we’ll use algorithms that can analyze a given text corpus and come up with good numerical representations of words (that is, word embeddings) so that words that fall within similar contexts (for example, “one” and “two,” “I” and “we”) will have similar numerical representations compared to words that are unrelated (for example, “cat” and “volcano”).

Classical approaches to learning word representation

In this section, we’ll discuss some of the classical approaches used for numerically representing words. It’s important to have an understanding of the alternatives to word vectors because these methods are still used in the real world, especially when limited data is available.

More specifically, we’ll discuss common representations, such as one-hot encoding and term frequency-inverse document frequency (TF-IDF).

One-hot encoded representation

One of the simpler ways of representing words is to use the one-hot encoded representation. This means that if we have a vocabulary of size $V$ , for each $i^{th}$ word $w_i$ , we’ll represent the word $w_i$ with a $V$ -length vector $[0, 0, 0, ..., 0, 1, 0, ..., 0, 0, 0]$ ...

Introduction to Natural Language Processing

Understanding TensorFlow 2

Word2vec: Learning Word Embeddings

Advanced Word Vector Algorithms

Sentence Classification with Convolutional Neural Networks

Recurrent Neural Networks

Understanding Long Short-Term Memory Networks

Applications of LSTM: Generating Text

Sequence-to-Sequence Learning: Neural Machine Translation

Transformers

Sarcasm Classification Using BERT

Image Captioning with Transformers

Caption Generation Using PyTorch

Final Remarks

Appendix: Mathematical Foundations and Advanced TensorFlow

Classical Approaches to Learning Word Representations

Word representations

Classical approaches to learning word representation

One-hot encoded representation