You're reading from Pretrain Vision and Large Language Models in Python End-to-end techniques for building and deploying foundation models on AWS

Product type Paperback

Published in May 2023

Publisher Packt

ISBN-13 9781804618257

Length 258 pages

Edition 1st Edition

Languages

Python

Tools

AWS

Concepts

GPT/LLMs

Author (1):

Emily Webber

View More author details

Table of Contents (23) Chapters

Preface

1. Part 1: Before Pretraining

2. Chapter 1: An Introduction to Pretraining Foundation Models FREE CHAPTER

3. Chapter 2: Dataset Preparation: Part One

4. Chapter 3: Model Preparation

5. Part 2: Configure Your Environment

6. Chapter 4: Containers and Accelerators on the Cloud

7. Chapter 5: Distribution Fundamentals

8. Chapter 6: Dataset Preparation: Part Two, the Data Loader

9. Part 3: Train Your Model

10. Chapter 7: Finding the Right Hyperparameters

11. Chapter 8: Large-Scale Training on SageMaker

12. Chapter 9: Advanced Training Concepts

13. Part 4: Evaluate Your Model

14. Chapter 10: Fine-Tuning and Evaluating

15. Chapter 11: Detecting, Mitigating, and Monitoring Bias

16. Chapter 12: How to Deploy Your Model

17. Part 5: Deploy Your Model

18. Chapter 13: Prompt Engineering

19. Chapter 14: MLOps for Vision and Language

20. Chapter 15: Future Trends in Pretraining Foundation Models

21. Index

Why subscribe?

22. Other Books You May Enjoy

Enhancing your dataset – multilingual, multimodal, and augmentations

Finally, now that you’ve learned how to pick a dataset, compare it with research datasets, determine the right approximate size, and evaluate bias, let’s dive into enhancing the dataset. In particular, we’ll look at a few dimensions – multilingual, multimodal, and augmentations. All three of these typically come a bit later in your ML projects, especially after the first few versions of your models have been trained and you’re looking for the next idea to give you a boost.

Personally, I think there are few applications in the world where multilingually isn’t a strong added value. Multilingual just means multiple languages. While many of the state-of-the-art language models were originally trained on English-only text, researchers in the last few years have made strong efforts to increase the lingual diversity of these corpora. That means they’re adding support...