Let's put our newly acquired knowledge to the test. Try answering the following questions:
- What are the steps involved in the self-attention mechanism?
- What is scaled dot product attention?
- How do we create the query, key, and value matrices?
- Why do we need positional encoding?
- What are the sublayers of the decoder?
- What are the inputs to the encoder-decoder attention layer of the decoder?