You're reading from Python Deep Learning - Third Edition

Product type Book

Published in Nov 2023

Publisher Packt

ISBN-13 9781837638505

Pages 362 pages

Edition 3rd Edition

Languages

Concepts

GPT/LLMs

Author (1):

Ivan Vasilev

Table of Contents (17) Chapters

Preface

1. Part 1:Introduction to Neural Networks

2. Chapter 1: Machine Learning – an Introduction

3. Chapter 2: Neural Networks

4. Chapter 3: Deep Learning Fundamentals

5. Part 2: Deep Neural Networks for Computer Vision

6. Chapter 4: Computer Vision with Convolutional Networks

7. Chapter 5: Advanced Computer Vision Applications

8. Part 3: Natural Language Processing and Transformers

9. Chapter 6: Natural Language Processing and Recurrent Neural Networks

10. Chapter 7: The Attention Mechanism and Transformers

11. Chapter 8: Exploring Large Language Models in Depth

12. Chapter 9: Advanced Applications of Large Language Models

13. Part 4: Developing and Deploying Deep Neural Networks

14. Chapter 10: Machine Learning Operations (MLOps)

15. Index

Why subscribe?

16. Other Books You May Enjoy

LLM architecture

In Chapter 7, we introduced the multi-head attention (MHA) mechanism and the three major transformer variants—encoder-decoder, encoder-only, and decoder-only (we used BERT and GPT as prototypical encoder and decoder models). In this section, we’ll discuss various bits and pieces of the LLM architecture. Let’s start by focusing our attention (yes—it’s the same old joke) on the attention mechanism.

LLM attention variants

The attention we discussed so far is known as global attention. The following diagram displays the connectivity matrix of a bidirectional global self-attention mechanism (context window with size n=8):

Figure 8.1 – Global self-attention with a context window with size n=8

Each row and column represent the full input token sequence, . The dotted colored diagonal cells represent the current input token (query), <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"><mml:msub><mml:mrow><mml:mi mathvariant="bold">t</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:math> . The uninterrupted colored cells of each column represent all tokens...