What this book covers
Chapter 1, An Introduction to Pretraining Foundation Models In this chapter you’ll be introduced to foundation models, the backbone of many artificial intelligence and machine learning systems today. We will dive into their creation process, also called pretraining, and understand where it’s competitive to improve the accuracy of your models. We will discuss the core transformer architecture underpinning state of the art models like Stable Diffusion, BERT, Vision Transformers, CLIP, Flan-T5 and more. You will learn about the encoder and decoder frameworks that work to solve a variety of use cases.
Chapter 2, Dataset Preparation: Part One In this chapter, we begin to discuss what you’ll need in your dataset to start a meaningful pretraining project. This is the first of two parts on dataset preparation. It opens with some business guidance on finding a good use case for foundation modeling, where the data become instrumental. Then, focusing on the content of your dataset, we use qualitative and quantitative measures to compare it with datasets used in pretraining other top models. You’ll learn how to use the scaling laws to determine if your datasets are “large enough” and “good enough” to boost accuracy while pretraining. We discuss bias identification and mitigation, along with multilingual and multimodal solutions.
Chapter 3, Model Preparation In this chapter you’ll learn how to pick which model will be most useful to serve as a basis for your pretraining regime. You’ll learn how to think about the size of the model in parameters, along with the key loss functions and how they determine performance in production. You’ll combine the scaling laws with your expected dataset size to select ceiling and basement model sizes, which you’ll use to guide your experiments.
Chapter 4, Containers and Accelerators on the Cloud In this chapter, you’ll learn how to containerize your scripts and optimize them for accelerators on the cloud. We’ll learn about a range of accelerators for foundation models, including tradeoffs around cost and performance across the entire machine learning lifecycle. You’ll learn key aspects of Amazon SageMaker and AWS to train models on accelerators, optimize performance, and troubleshoot common issues. If you’re already familiar with using accelerators on AWS, feel free to skip this chapter.
Chapter 5, Distribution Fundamentals In this chapter, you’ll learn conceptual fundamentals for the distribution techniques you need to employ for large scale pretraining and fine-tuning. First, you’ll master top distribution concepts for machine learning, notably model and data parallel. Then, you’ll learn how Amazon SageMaker integrates with distribution software to run your job on as many GPUs as you need. You’ll learn how to optimize model and data parallel for large-scale training especially with techniques like sharded data parallelism. Then, you’ll learn how to reduce your memory consumption with advanced techniques like optimizer state sharding, activation checkpointing, compilation, and more. Lastly, we’ll look at a few examples across language, vision, and more to bring all of these concepts together.
Chapter 6, Dataset Preparation: Part Two, the Data Loader In this chapter, you’ll learn how to prepare your dataset to immediately use with your chosen models. You’ll master the concept of a data loader, knowing why it’s a common source of error in training large models. You’ll learn about creating embeddings, using tokenizers and other methods to featurize your raw data for your preferred neural network. Following these steps, you’ll be able to prepare your entire dataset, using methods for both vision and language. Finally, you’ll learn about data optimizations on AWS and Amazon SageMaker to efficiently send datasets large and small to your training cluster. Throughout this chapter we’ll work backwards from the training loop; giving you incrementally all the steps you need to have functional deep neural networks training at scale. You’ll also follow a case study from how I trained on 10TB for Stable Diffusion on SageMaker!
Chapter 7, Finding the Right Hyperparameters In this chapter, you’ll dive into the key hyperparameters that govern performance for top vision and language models, such as batch size, learning rate, and more. First, we’ll start with a quick overview of hyperparameter tuning for those who are new or new a light refresh, including key examples in vision and language. Then we’ll explore hyperparameter tuning in foundation models, both what is possible today and where trends might emerge. Finally, we’ll learn how to do this on Amazon SageMaker, taking incremental steps in a cluster size and changing each hyperparameter as we do.
Chapter 8, Large-Scale Training on SageMaker In this chapter, we cover key features and functionality available with Amazon SageMaker for running highly optimized distributed training. You’ll learn how to optimize your script for SageMaker training, along with key usability features. You’ll also learn about backend optimizations for distributed training with SageMaker, like GPU health checks, resilient training, checkpointing, script mode, and more.
Chapter 9, Advanced Training Concepts In this chapter, we will cover advanced training concepts at scale, like evaluating throughput, calculating model TFLOPS per device, compilation, and using the scaling laws to determine the right length of training time. In the last chapter you learned about how to do large-scale training on SageMaker generally speaking. In this chapter you’ll learn about particularly complex and sophisticated techniques you can use to drive down the overall cost of your job. This lower cost directly translates to higher model performance, because it means you can train for longer on the same budget.
Chapter 10, Fine-Tuning and Evaluating In this chapter, you’ll learn how to fine-tune your model on use case-specific datasets, comparing its performance to that of off-the-shelf public models. You should be able to see quantitative and qualitative boost from your pretraining regime. You’ll dive into some examples from language, text, and everything in-between. You’ll also learn how to think about and design a human-in-the-loop evaluation system, including the same RLHF that makes ChatGPT tick! This chapter focuses on updating the trainable weights of the model. For techniques that mimic learning but don’t update the weights, such as prompt tuning and standard retrieval augmented generation, see Chapter 13 on prompt engineering or Chapter 15 on future trends.
Chapter 11, Detecting, Mitigating, and Monitoring Bias In this chapter, we’ll analyze leading bias identification and mitigation strategies for large vision, language, and multimodal models. You’ll learn about the concept of bias, both in a statistical sense and how it impacts human beings in critical ways. You’ll understand key ways to quantify and remedy this in vision and language models, eventually landing on monitoring strategies that enable you to reduce any and all forms of harm when applying your foundation models.
Chapter 12, How to Deploy Your Model In this chapter, we’ll introduce you to a variety of techniques for deploying your model, including real-time endpoints, serverless, batch options and more. These concepts apply to many compute environments, but we’ll focus on capabilities available on AWS within Amazon SageMaker. We’ll talk about why you should try to shrink the size of your model before deploying, along with techniques for this across vision and language. We’ll also cover distributed hosting techniques, for scenarios when you can’t or don’t need to shrink your model. Lastly, we’ll explore model serving techniques and concepts that can help you optimize the end-to-end performance of your model.
Chapter 13, Prompt Engineering In this chapter, we’ll dive into a special set of techniques called prompt engineering. You’ll learn about this technique at a high level, including how it is similar to and different from other learning-based topics throughout this book. We’ll explore examples across vision and language, and dive into key terms and success metrics. In particular this chapter covers all of the tips and tricks for improving performance without updating the model weights. This means we’ll be mimicking the learning process, without necessarily changing any of the model parameters. This includes some advanced techniques like prompt and prefix tuning.
Chapter 14, MLOps for Vision and Language In this chapter, we’ll introduce core concepts of operations and orchestration for machine learning, also known as MLOps. This includes building pipelines, continuous integration and deployment, promotion through environments, and more. We’ll explore options for monitoring and human-in-the-loop auditing of model predictions. We’ll also identify unique ways to support large vision and language models in your MLOps pipelines.
Chapter 15, Future Trends in Pretraining Foundation Models In this chapter, we’ll close out the book by pointing to where trends are headed for all relevant topics presented in this book. We’ll explore trends in foundation model application development, like using LangChain to build interactive dialogue applications, along with techniques like retrieval augmented generation to reduce LLM hallucination. We’ll explore ways to use generative models to solve classification tasks, human-centered design, and other generative modalities like code, music, product documentation, PowerPoints, and more! We’ll talk through AWS offerings like SageMaker JumpStart Foundation Models, Amazon Bedrock, Amazon Titan, and Amazon Code Whisperer, and top trends in the future of foundation models and pretraining itself.