Sublayer
Each layer contains sublayers, as shown in Figure I.2. Each sublayer of different layers has an identical structure, which boosts hardware optimization.
The original Transformer contains two sublayers that run from bottom to top:
- A self-attention sublayer, designed specifically for NLP and hardware optimization
- A classical feedforward network with some tweaking
![](https://static.packt-cdn.com/products/9781803247335/graphics/Images/B17948_Appendix_I_02.png)
Figure I.2: A layer contains two sublayers