Evolutions of LLMs Architectures
The development of language model architectures has undergone a transformative journey as shown in Figure 1.5, tracing its origins from simple word embeddings to sophisticated models capable of understanding and generating multimodal content. This progression is elegantly depicted in figure X about the "LLM Evolutionary Tree" that starts from foundational models before 2018, such as FastText, GloVe, and Word2Vec, and extends to the latest advancements like the LLaMA series and Google's Bard.
Let's look at this evolution in a bit more detail:
Early Foundations: Word Embeddings
Initially, models like FastText, GloVe, and Word2Vec represented words as vectors in high-dimensional space, capturing semantic and syntactic similarities based on their co-occurrence in large text corpora. These embeddings provided a static representation of words, serving as the backbone for many early...