Let's assess the knowledge gained in this chapter. Try answering the following questions:
- What is the use of VideoBERT?
- How is VideoBERT pre-trained?
- How does linguistic-visual alignment differ from next sentence prediction?
- Define the text-only training objective.
- Define the video-only training objective.
- What is BART?
- Explain token masking and token deletion.