How Transformers work
Moving on to the general transformers, Figure 1.8 shows the structure of a Transformer:
Figure 1.8: Architecture of a Transformer: an encoder for the inputs and a decoder for the outputs (reproduced from Zahere)
You can see that it has an encoder and a decoder. The encoder learns the patterns in the data and the decoder tries to recreate them.
The encoder has multiple neural network layers. In transformers, each layer uses self-attention, allowing the encoder to understand how the different parts of the sentence fit together and understand the context.
Here is a quick version of the transformer process:
- Encoder network:
Uses multiple layers of neural networks.
Each layer employs self-attention to understand relationships between sentence parts and context.
Creates a compressed representation of the input.
- Decoder network:
Utilizes the encoder’s representation for generating new outputs.
Employs multiple layers...