Vision transformers
Transformers solved many problems in the NLP domain and provided good results. However, some researchers started to apply them to other domains rather than text-only ones. Computer vision is one of the fields in which transformers are actively used. In this section, you will learn how it is possible to apply transformers to computer vision. This was one of the essential problems in the early days of adaptation. NLP is composed of text-only data that is mostly seen as a time series character-based input with a respective output. However, computer vision problems come with an image as input followed by the desired output, which can be in the form of numeric values or a matrix. A gray-scale image is represented as a matrix with respective width and height values. The easiest way to see it is as a black-and-white image:
Figure 16.1 – A black and white image of character A
As you can see in Figure 16.1, each cell in the matrix resembles...