Let's put our knowledge to the test by answering the following questions:
- What is knowledge distillation?
- What are the soft target and soft prediction?
- Define distillation loss.
- What is the use of DistilBERT?
- What is the loss function of DistilBERT?
- How does transformer layer distillation work?
- How does prediction layer distillation work?