Search icon CANCEL
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Engineering MLOps

You're reading from   Engineering MLOps Rapidly build, test, and manage production-ready machine learning life cycles at scale

Arrow left icon
Product type Paperback
Published in Apr 2021
Publisher Packt
ISBN-13 9781800562882
Length 370 pages
Edition 1st Edition
Arrow right icon
Author (1):
Arrow left icon
Emmanuel Raj Emmanuel Raj
Author Profile Icon Emmanuel Raj
Emmanuel Raj
Arrow right icon
View More author details
Toc

Table of Contents (18) Chapters Close

Preface 1. Section 1: Framework for Building Machine Learning Models
2. Chapter 1: Fundamentals of an MLOps Workflow FREE CHAPTER 3. Chapter 2: Characterizing Your Machine Learning Problem 4. Chapter 3: Code Meets Data 5. Chapter 4: Machine Learning Pipelines 6. Chapter 5: Model Evaluation and Packaging 7. Section 2: Deploying Machine Learning Models at Scale
8. Chapter 6: Key Principles for Deploying Your ML System 9. Chapter 7: Building Robust CI/CD Pipelines 10. Chapter 8: APIs and Microservice Management 11. Chapter 9: Testing and Securing Your ML Solution 12. Chapter 10: Essentials of Production Release 13. Section 3: Monitoring Machine Learning Models in Production
14. Chapter 11: Key Principles for Monitoring Your ML System 15. Chapter 12: Model Serving and Monitoring 16. Chapter 13: Governing the ML System for Continual Learning 17. Other Books You May Enjoy

Data preprocessing

Raw data cannot be directly passed to the ML model for training purposes. We have to refine or preprocess the data before training the ML model. To further analyze the imported data, we will perform a series of steps to preprocess the data into a suitable shape for the ML training. We start by assessing the quality of the data to check for accuracy, completeness, reliability, relevance, and timeliness. After this, we calibrate the required data and encode text into numerical data, which is ideal for ML training. Lastly, we will analyze the correlations and time series, and filter out irrelevant data for training ML models.

Data quality assessment

To assess the quality of the data, we look for accuracy, completeness, reliability, relevance, and timeliness. Firstly, let's check if the data is complete and reliable by assessing the formats, cumulative statistics, and anomalies such as missing data. We use pandas functions as follows:

df.describe(...
lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at €18.99/month. Cancel anytime