You're reading from Python Deep Learning Understand how deep neural networks work and apply them to real-world tasks

Product type Paperback

Published in Nov 2023

Publisher Packt

ISBN-13 9781837638505

Length 362 pages

Edition 3rd Edition

Languages

Python

Tools

Keras

Concepts

Deep Learning

Author (1):

Ivan Vasilev

View More author details

Table of Contents (17) Chapters

Preface

1. Part 1:Introduction to Neural Networks

2. Chapter 1: Machine Learning – an Introduction FREE CHAPTER

3. Chapter 2: Neural Networks

4. Chapter 3: Deep Learning Fundamentals

5. Part 2: Deep Neural Networks for Computer Vision

6. Chapter 4: Computer Vision with Convolutional Networks

7. Chapter 5: Advanced Computer Vision Applications

8. Part 3: Natural Language Processing and Transformers

9. Chapter 6: Natural Language Processing and Recurrent Neural Networks

10. Chapter 7: The Attention Mechanism and Transformers

11. Chapter 8: Exploring Large Language Models in Depth

12. Chapter 9: Advanced Applications of Large Language Models

13. Part 4: Developing and Deploying Deep Neural Networks

14. Chapter 10: Machine Learning Operations (MLOps)

15. Index

Why subscribe?

16. Other Books You May Enjoy

LLM architecture

In Chapter 7, we introduced the multi-head attention (MHA) mechanism and the three major transformer variants—encoder-decoder, encoder-only, and decoder-only (we used BERT and GPT as prototypical encoder and decoder models). In this section, we’ll discuss various bits and pieces of the LLM architecture. Let’s start by focusing our attention (yes—it’s the same old joke) on the attention mechanism.

LLM attention variants

The attention we discussed so far is known as global attention. The following diagram displays the connectivity matrix of a bidirectional global self-attention mechanism (context window with size n=8):

Figure 8.1 – Global self-attention with a context window with size n=8

Each row and column represent the full input token sequence, . The dotted colored diagonal cells represent the current input token (query), <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"><mml:msub><mml:mrow><mml:mi mathvariant="bold">t</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:math> . The uninterrupted colored cells of each column represent all tokens...

The rest of the chapter is locked

You're reading from Python Deep Learning Understand how deep neural networks work and apply them to real-world tasks

Table of Contents (17) Chapters

LLM architecture

LLM attention variants

Authors (1)

Personalised recommendations for you

You're reading from Python Deep Learning Understand how deep neural networks work and apply them to real-world tasks

Table of Contents (17) Chapters

LLM architecture

LLM attention variants

Unlock this book and the full library FREE for 7 days

Authors (1)

Personalised recommendations for you