Understanding self-attention
The Transformer neural network architecture revolves around the self-attention mechanism. So, let’s first kick off the chapter by looking at this. Self-attention is a mechanism used in machine learning, particularly in NLP and computer vision. It allows a model to weigh the importance of different parts of the input sequence.
Self-attention is a specific type of attention mechanism. In traditional attention mechanisms, the importance weights are between two different sets of input data. For example, an attention-based English-to-French translation model may focus on specific parts (e.g., nouns, verbs) of the English source sentences that are relevant to the current French target word being generated. However, in self-attention, the importance weighting operates between any two elements within the same input sequence. It focuses on how different parts in the same sequence relate to each other. Used for English-to-French translation, the self...