Beyond Text: Vision Transformers in the Dawn of Revolutionary AI
Up to now, we have examined variations of the Original Transformer model with encoder and decoder layers. We have also explored other models with encoder-only or decoder-only stacks of layers. Also, the size of the layers and parameters has increased. However, the fundamental architecture of the Transformer retains its original structure with identical layers and the parallelization of the computing of the attention heads.
In this chapter, we will explore the innovative transformer models that respect the basic structure of the Original Transformer but make some significant changes. Scores of transformer models will appear, like the many possibilities a box of LEGO© pieces gives. You can assemble those pieces in hundreds of ways! Transformer model sublayers and layers are LEGO© pieces of advanced AI.
We will discover powerful computer vision transformers like ViT, CLIP, DALL-E, and GPT-4V. We can add...