Transformer Visualization via Dictionary Learning

Learn about how transformer layers are visualized through dictionary learning.

Transformer visualization via dictionary learning is based on transformer factors.

Transformer factors

A transformer factor is an embedding vector that contains contextualized words. A word with no context can have many meanings, creating a polysemy issue. For example, the word “separate” can be a verb or an adjective. Furthermore, separate can mean disconnect, discriminate, scatter, and has many other definitions.

Yun et al., 2021, therefore created an embedding vector with contextualized words in the paper “Transformer visualization via dictionary learning: contextualized embedding as a linear superposition of transformer factors.The paper can be accessed at: https://arxiv.org/abs/2103.15949” A word embedding vector can be constructed with sparse linear representations of word factors. For example, depending on the context of the sentences in a dataset, separate can be represented as:

Get hands-on with 1400+ tech skills courses.