8
Exploring Large Language Models in Depth
In recent years, interest in transformers has skyrocketed in the academic world, industry, and even the general public. The state-of-the-art transformer-based architectures today are called large language models (LLMs). The most captivating feature is their text-generation capabilities, and the most popular example is ChatGPT (https://chat.openai.com/). But in their core lies the humble transformer we introduced in Chapter 7. Luckily, we already have a solid foundation of transformers. One remarkable aspect of this architecture is that it has changed little in the years since it was introduced. Instead, the capabilities of LLMs have grown with their size (the name gives it away), lending credibility to the phrase quantitative change leads to qualitative change.
The success of LLMs has further fueled the research in the area (or is it the other way around?). On the one hand, large industrial labs (such as Google, Meta, Microsoft, or OpenAI...