You're reading from Machine Learning Engineering on AWS Build, scale, and secure machine learning systems and MLOps pipelines in production

Product type Paperback

Published in Oct 2022

Publisher Packt

ISBN-13 9781803247595

Length 530 pages

Edition 1st Edition

Tools

AWS

Concepts

Machine Learning

Author (1):

Joshua Arvin Lat

View More author details

Table of Contents (19) Chapters

Preface

1. Part 1: Getting Started with Machine Learning Engineering on AWS

2. Chapter 1: Introduction to ML Engineering on AWS FREE CHAPTER

3. Chapter 2: Deep Learning AMIs

4. Chapter 3: Deep Learning Containers

5. Part 2:Solving Data Engineering and Analysis Requirements

6. Chapter 4: Serverless Data Management on AWS

7. Chapter 5: Pragmatic Data Processing and Analysis

8. Part 3: Diving Deeper with Relevant Model Training and Deployment Solutions

9. Chapter 6: SageMaker Training and Debugging Solutions

10. Chapter 7: SageMaker Deployment Solutions

11. Part 4:Securing, Monitoring, and Managing Machine Learning Systems and Environments

12. Chapter 8: Model Monitoring and Management Solutions

13. Chapter 9: Security, Governance, and Compliance Strategies

14. Part 5:Designing and Building End-to-end MLOps Pipelines

15. Chapter 10: Machine Learning Pipelines with Kubeflow on Amazon EKS

16. Chapter 11: Machine Learning Pipelines with SageMaker Pipelines

17. Index

Why subscribe?

18. Other Books You May Enjoy

Getting started with data processing and analysis

In the previous chapter, we utilized a data warehouse and a data lake to store, manage, and query our data. Data stored in these data sources generally must undergo a series of data processing and data transformation steps similar to those shown in Figure 5.1 before it can be used as a training dataset for ML experiments:

Figure 5.1 – Data processing and analysis

In Figure 5.1, we can see that these data processing steps may involve merging different datasets, along with cleaning, converting, analyzing, and transforming the data using a variety of options and techniques. In practice, data scientists and ML engineers generally spend a lot of hours cleaning the data and getting it ready for use in ML experiments. Some professionals may be used to writing and running custom Python or R scripts to perform this work. However, it may be more practical to use no-code or low-code solutions such as AWS Glue DataBrew...