Let's put our knowledge to the test. Try answering the following questions:
- How do you use the pre-trained BERT model?
- What is the use of the [PAD] token?
- What is an attention mask?
- What is fine-tuning?
- How do you compute the starting index of an answer in question-answering?
- How do you compute the ending index of an answer in question-answering?
- How do you use BERT for NER?