Transformer visualization with BertViz
Jesse Vig's article, A Multiscale Visualization of Attention in the Transformer Model, 2019, recognizes the effectiveness of transformer models. However, Jesse Vig explains that deciphering the attention mechanism is challenging. The paper describes the process of BertViz, a visualization tool.BertViz can visualize attention head activity and interpret a transformer model's behavior.BertViz was first designed to visualize BERT and GPT models. In this section, we will visualize the activity of a BERT model.Some tools mention the term "interpretable," stressing the "why" of an output. Others use the term "explainable" to describe "how" an output is reached. Finally, some don't apply the nuance and use the terms loosely because "why" can sometimes mean "how" to explain why! We will use the terms loosely, as the tools in this chapter most often do.Let...