Transformer visualization via dictionary learning
Transformer visualization via dictionary learning is based on transformer factors. The goal is to analyze words in their context.
Transformer factors
A transformer factor is an embedding vector that contains contextualized words. A word without context can have many meanings, creating a polysemy issue. For example, the word separate
can be a verb or an adjective. Furthermore, separate
can mean disconnect, discriminate, scatter, and many other definitions.Yun et al. (2021) thus created an embedding vector with contextualized words. A word embedding vector can be constructed with sparse linear representations of word factors. For example, depending on the context of the sentences in a dataset, separate
can be represented as:
separate=0.3" keep apart"+"0.3" distinct"+ 0.1 "discriminate"+0.1 "sever" + 0.1 "disperse"+0.1 "scatter...