Summary: Interpreting Black Box Transformer Models

Revisit the concepts covered in this section with this summary lesson.

Transformer models are trained to resolve word-level polysemy disambiguation in low-level, mid-level, and high-level dependencies. The process is achieved by connecting training million-to trillion-parameter models. The task of interpreting these giant models seems daunting. However, several tools are emerging.

We first imported BertViz. We learned how to interpret the computations of the attention heads with an interactive interface. We saw how words interacted with other words for each layer.

The chapter continued by defining the scope of probing and non-probing tasks. Probing tasks such as NER provide insights into how a transformer model represents language. However, non-probing methods analyze how the model makes predictions. For example, LIT plugged a PCA project and UMAP representations into the outputs of a BERT transformer model. We could then analyze clusters of outputs to see how they fit together.

Finally, we ran transformer visualization via dictionary learning. A user can choose a transformer factor to analyze and visualize the evolution of its representation from the lower layers to the higher layers of the transformer. The factor will progressively go from polysemy disambiguation to sentence context analysis and finally to long-term dependencies.

Get hands-on with 1200+ tech skills courses.