Summary
Transformer models are trained to resolve word-level polysemy disambiguation and low-level, mid-level, and high-level dependencies. The process is achieved by training million - to trillion-parameter models. The task of interpreting these giant models seems daunting. However, several tools are emerging.We first installed BertViz
. We learned how to interpret the computations of the attention heads with an interactive interface. We saw how words interacted with other words for each layer. We introduced ExBERT, another approach to visualizing BERT, among other models.The chapter continued by defining SHAP and revealing the contribution of each word processed by Hugging Face Transformers.We then ran transformer visualization via dictionary learning with LIME. A user can choose a transformer factor to analyze and visualize the evolution of its representation from the lower layers to the higher layers of the transformer. The factor will progressively go from polysemy disambiguation...