Sublayer
Each layer contains sublayers, as shown in Figure I.2. Each sublayer of different layers has an identical structure, which boosts hardware optimization.
The original Transformer contains two sublayers that run from bottom to top:
- A self-attention sublayer, designed specifically for NLP and hardware optimization
- A classical feedforward network with some tweaking
Figure I.2: A layer contains two sublayers