Evolving language models – the AR Transformer and its role in GenAI
In Chapter 2, we reviewed some of the generative paradigms that apply a transformer-based approach. Here, we trace the evolution of Transformers more closely, outlining some of the most impactful transformer-based language models from the initial transformer in 2017 to more recent state-of-the-art models that demonstrate the scalability, versatility, and societal considerations involved in this fast-moving domain of AI (as illustrated in Figure 3.3):
Figure 3.3: From the original transformer to GPT-4
- 2017 – Transformer: The transformer model, introduced by Vaswani et al., was a paradigm shift in NLP, featuring self-attention layers that could process entire sequences of data in parallel. This architecture enabled the model to evaluate the importance of each word in a sentence relative to all other words, thereby enhancing the model’s ability to capture the context...