You're reading from Engineering MLOps Rapidly build, test, and manage production-ready machine learning life cycles at scale

Product type Paperback

Published in Apr 2021

Publisher Packt

ISBN-13 9781800562882

Length 370 pages

Edition 1st Edition

Tools

Azure Functions

Concepts

Machine Learning

Author (1):

Emmanuel Raj

View More author details

Table of Contents (18) Chapters

Preface

1. Section 1: Framework for Building Machine Learning Models

2. Chapter 1: Fundamentals of an MLOps Workflow FREE CHAPTER

3. Chapter 2: Characterizing Your Machine Learning Problem

4. Chapter 3: Code Meets Data

5. Chapter 4: Machine Learning Pipelines

6. Chapter 5: Model Evaluation and Packaging

7. Section 2: Deploying Machine Learning Models at Scale

8. Chapter 6: Key Principles for Deploying Your ML System

9. Chapter 7: Building Robust CI/CD Pipelines

10. Chapter 8: APIs and Microservice Management

11. Chapter 9: Testing and Securing Your ML Solution

12. Chapter 10: Essentials of Production Release

13. Section 3: Monitoring Machine Learning Models in Production

14. Chapter 11: Key Principles for Monitoring Your ML System

15. Chapter 12: Model Serving and Monitoring

16. Chapter 13: Governing the ML System for Continual Learning

17. Other Books You May Enjoy

Data preprocessing

Raw data cannot be directly passed to the ML model for training purposes. We have to refine or preprocess the data before training the ML model. To further analyze the imported data, we will perform a series of steps to preprocess the data into a suitable shape for the ML training. We start by assessing the quality of the data to check for accuracy, completeness, reliability, relevance, and timeliness. After this, we calibrate the required data and encode text into numerical data, which is ideal for ML training. Lastly, we will analyze the correlations and time series, and filter out irrelevant data for training ML models.

Data quality assessment

To assess the quality of the data, we look for accuracy, completeness, reliability, relevance, and timeliness. Firstly, let's check if the data is complete and reliable by assessing the formats, cumulative statistics, and anomalies such as missing data. We use pandas functions as follows: