Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds
Arrow up icon
GO TO TOP
Pretrain Vision and Large Language Models in Python

You're reading from   Pretrain Vision and Large Language Models in Python End-to-end techniques for building and deploying foundation models on AWS

Arrow left icon
Product type Paperback
Published in May 2023
Publisher Packt
ISBN-13 9781804618257
Length 258 pages
Edition 1st Edition
Languages
Tools
Concepts
Arrow right icon
Author (1):
Arrow left icon
Emily Webber Emily Webber
Author Profile Icon Emily Webber
Emily Webber
Arrow right icon
View More author details
Toc

Table of Contents (23) Chapters Close

Preface 1. Part 1: Before Pretraining
2. Chapter 1: An Introduction to Pretraining Foundation Models FREE CHAPTER 3. Chapter 2: Dataset Preparation: Part One 4. Chapter 3: Model Preparation 5. Part 2: Configure Your Environment
6. Chapter 4: Containers and Accelerators on the Cloud 7. Chapter 5: Distribution Fundamentals 8. Chapter 6: Dataset Preparation: Part Two, the Data Loader 9. Part 3: Train Your Model
10. Chapter 7: Finding the Right Hyperparameters 11. Chapter 8: Large-Scale Training on SageMaker 12. Chapter 9: Advanced Training Concepts 13. Part 4: Evaluate Your Model
14. Chapter 10: Fine-Tuning and Evaluating 15. Chapter 11: Detecting, Mitigating, and Monitoring Bias 16. Chapter 12: How to Deploy Your Model 17. Part 5: Deploy Your Model
18. Chapter 13: Prompt Engineering 19. Chapter 14: MLOps for Vision and Language 20. Chapter 15: Future Trends in Pretraining Foundation Models 21. Index 22. Other Books You May Enjoy

Encoders and decoders

Now, I’d like to briefly introduce you to two key topics that you’ll see in the discussion of transformer-based models: encoders and decoders. Let’s establish some basic intuition to help you understand what they are all about. An encoder is simply a computational graph (or neural network, function, or object depending on your background), which takes an input with a larger feature space and returns an object with a smaller feature space. We hope (and demonstrate computationally) that the encoder is able to learn what is most essential about the provided input data.

Typically, in large language and vision models, the encoder itself is composed of a number of multi-head self-attention objects. This means that in transformer-based models, an encoder is usually a number of self-attention steps, learning what is most essential about the provided input data and passing this onto the downstream model. Let’s look at a quick visual:

Figure 1.1 – Encoders and decoders

Figure 1.1 – Encoders and decoders

Intuitively, as you can see in the preceding figure, the encoder starts with a larger input space and iteratively compresses this to a smaller latent space. In the case of classification, this is just a classification head with output allotted for each class. In the case of masked language modeling, encoders are stacked on top of each other to better predict tokens to replace the masks. This means the encoders output an embedding, or a numerical representation of that token, and after prediction, the tokenizer is reused to translate that embedding back into natural language.

One of the earliest large language models, BERT, is an encoder-only model. Most other BERT-based models, such as DeBERTa, DistiliBERT, RoBERTa, DeBERTa, and others in this family use encoder-only model architectures. Decoders operate exactly in reverse, starting with a compressed representation and iteratively recomposing that back into a larger feature space. Both encoders and decoders can be combined, as in the original Transformer, to solve text-to-text problems.

To make it easier, here’s a short table quickly summarizing the three types of self-attention blocks we’ve looked at, encoders, decoders, and their combination.

Size of inputs and outputs

Type of self-attention blocks

Machine learning tasks

Example models

Long to short

Encoder

Classification, any dense representation

BERT, DeBERTa, DistiliBERT, RoBERTa, XLM, AlBERT, CLIP, VL-BERT, Vision Transformer

Short to long

Decoder

Generation, summarization, question-answering, any sparse representation

GPT, GPT-2, GPT-Neo, GPT-J, ChatGPT, GPT-4, BLOOM, OPT

Equal

Encoder-decoder

Machine translation, style translation

T5, BART, BigBird, FLAN-T5, Stable Diffusion

Table 1.3 – Encoders, decoders, and their combination

Now that you have a better understanding of encoders, decoders, and the models they create, let’s close out the chapter with a quick recap of all the concepts you just learned about.

You have been reading a chapter from
Pretrain Vision and Large Language Models in Python
Published in: May 2023
Publisher: Packt
ISBN-13: 9781804618257
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image