You're reading from Machine Learning Engineering on AWS Build, scale, and secure machine learning systems and MLOps pipelines in production

Product type Paperback

Published in Oct 2022

Publisher Packt

ISBN-13 9781803247595

Length 530 pages

Edition 1st Edition

Tools

AWS

Concepts

Machine Learning

Author (1):

Joshua Arvin Lat

View More author details

Table of Contents (19) Chapters

Preface

1. Part 1: Getting Started with Machine Learning Engineering on AWS

2. Chapter 1: Introduction to ML Engineering on AWS FREE CHAPTER

3. Chapter 2: Deep Learning AMIs

4. Chapter 3: Deep Learning Containers

5. Part 2:Solving Data Engineering and Analysis Requirements

6. Chapter 4: Serverless Data Management on AWS

7. Chapter 5: Pragmatic Data Processing and Analysis

8. Part 3: Diving Deeper with Relevant Model Training and Deployment Solutions

9. Chapter 6: SageMaker Training and Debugging Solutions

10. Chapter 7: SageMaker Deployment Solutions

11. Part 4:Securing, Monitoring, and Managing Machine Learning Systems and Environments

12. Chapter 8: Model Monitoring and Management Solutions

13. Chapter 9: Security, Governance, and Compliance Strategies

14. Part 5:Designing and Building End-to-end MLOps Pipelines

15. Chapter 10: Machine Learning Pipelines with Kubeflow on Amazon EKS

16. Chapter 11: Machine Learning Pipelines with SageMaker Pipelines

17. Index

Why subscribe?

18. Other Books You May Enjoy

Setting up Lake Formation

Now, it’s time to take a closer look at setting up our serverless data lake on AWS! Before we begin, let’s define what a data lake is and what type of data is stored in it. A data lake is a centralized data store that contains a variety of structured, semi-structured, and unstructured data from different data sources. As shown in the following diagram, data can be stored in a data lake without us having to worry about the structure and format. We can use a variety of file types such as JSON, CSV, and Apache Parquet when storing data in a data lake. In addition to these, data lakes may include both raw and processed (clean) data:

Figure 4.26 – Getting started with data lakes

ML engineers and data scientists can use data lakes as the source of the data used for building and training ML models. Since the data stored in data lakes may be a mixture of both raw and clean data, additional data processing, data cleaning...