...

/

Transformer Architecture: Embedding Layers

Transformer Architecture: Embedding Layers

Learn about the embedding layers in the transformer.

Word embeddings provide a semantic-preserving representation of words based on the context in which words are used. In other words, if two words are used in the same context, they will have similar word vectors. For example, the words “cat” and “dog” will have similar representations, whereas “cat” and “volcanowill have vastly different representations.

Word vectors were initially introduced in the paper titled Efficient Estimation of Word Representations in Vector SpaceMikolov et al. (https://arxiv.org/pdf/1301.3781.pdf). It came in two variants: skip-gram and continuous bag-of-words. Embeddings work by first defining a large matrix of size V×EV \times E ...