You're reading from Generative AI Foundations in Python Discover key techniques and navigate modern challenges in LLMs

Product type Paperback

Published in Jul 2024

Publisher Packt

ISBN-13 9781835460825

Length 190 pages

Edition 1st Edition

Languages

Python

Tools

Transfer learning

Concepts

Artificial Intelligence

Author (1):

Carlos Rodriguez

View More author details

Table of Contents (13) Chapters

Preface

1. Part 1: Foundations of Generative AI and the Evolution of Large Language Models FREE CHAPTER

2. Chapter 1: Understanding Generative AI: An Introduction

3. Chapter 2: Surveying GenAI Types and Modes: An Overview of GANs, Diffusers, and Transformers

4. Chapter 3: Tracing the Foundations of Natural Language Processing and the Impact of the Transformer

5. Chapter 4: Applying Pretrained Generative Models: From Prototype to Production

6. Part 2: Practical Applications of Generative AI

7. Chapter 5: Fine-Tuning Generative Models for Specific Tasks

8. Chapter 6: Understanding Domain Adaptation for Large Language Models

9. Chapter 7: Mastering the Fundamentals of Prompt Engineering

10. Chapter 8: Addressing Ethical Considerations and Charting a Path Toward Trustworthy Generative AI

11. Index

Why subscribe?

12. Other Books You May Enjoy

Evolving language models – the AR Transformer and its role in GenAI

In Chapter 2, we reviewed some of the generative paradigms that apply a transformer-based approach. Here, we trace the evolution of Transformers more closely, outlining some of the most impactful transformer-based language models from the initial transformer in 2017 to more recent state-of-the-art models that demonstrate the scalability, versatility, and societal considerations involved in this fast-moving domain of AI (as illustrated in Figure 3.3):

Figure 3.3: From the original transformer to GPT-4

2017 – Transformer: The transformer model, introduced by Vaswani et al., was a paradigm shift in NLP, featuring self-attention layers that could process entire sequences of data in parallel. This architecture enabled the model to evaluate the importance of each word in a sentence relative to all other words, thereby enhancing the model’s ability to capture the context...