You're reading from The Machine Learning Solutions Architect Handbook Practical strategies and best practices on the ML lifecycle, system design, MLOps, and generative AI

Product type Paperback

Published in Apr 2024

Publisher Packt

ISBN-13 9781805122500

Length 602 pages

Edition 2nd Edition

Languages

Python

Tools

MLOps

Concepts

Machine Learning

Author (1):

David Ping

View More author details

Table of Contents (19) Chapters

Preface

1. Navigating the ML Lifecycle with ML Solutions Architecture FREE CHAPTER

2. Exploring ML Business Use Cases

3. Exploring ML Algorithms

4. Data Management for ML

5. Exploring Open-Source ML Libraries

6. Kubernetes Container Orchestration Infrastructure Management

7. Open-Source ML Platforms

8. Building a Data Science Environment Using AWS ML Services

9. Designing an Enterprise ML Architecture with AWS ML Services

10. Advanced ML Engineering

11. Building ML Solutions with AWS AI Services

12. AI Risk Management

13. Bias, Explainability, Privacy, and Adversarial Attacks

14. Charting the Course of Your ML Journey

15. Navigating the Generative AI Project Lifecycle

16. Designing Generative AI Platforms and Solutions

17. Other Books You May Enjoy

18. Index

Data processing

The data processing functionality of a data lake encompasses the frameworks and compute resources necessary for various data processing tasks, such as data correction, transformation, merging, splitting, and ML feature engineering. Common data processing frameworks include Python shell scripts and Apache Spark. The essential requirements for data processing technology are as follows:

Integration and compatibility with the underlying storage technology: The ability to seamlessly work with the native storage system simplifies data access and movement between the storage and processing layers.
Integration with the data catalog: The capability to interact with the data catalog's metastore for querying databases and tables within the catalog.
Scalability: The capacity to scale compute resources up or down to accommodate changing data volumes and processing velocity requirements.
Language and framework support: Support for popular data processing libraries and frameworks...