Vision Transformers
Transformers have been used to achieve significant goals in the field of natural language processing (NLP) and, as you have seen in the previous chapters, they can accomplish a lot in many different tasks. In this chapter, we will explore Vision Transformer (ViT) models. Just as a wide range of models has been created for NLP, various models have also been created for vision, and each one has created a new view of the computer vision perspective. By reading this chapter, you will learn how you can use models such as ViT for computer vision tasks, how pretrained computer vision models based on transformers work, and how you can fine-tune them for specific tasks. We will also look at models that are prompt-based and can provide results based on the visual prompts given to them.
The following topics will be covered in this chapter:
- Vision transformers
- Image classification using transformers
- Semantic segmentation and object detection using transformers...