You're reading from Agile Machine Learning with DataRobot Automate each step of the machine learning life cycle, from understanding problems to delivering value

Product type Paperback

Published in Dec 2021

Publisher Packt

ISBN-13 9781801076807

Length 344 pages

Edition 1st Edition

Languages

Python

Tools

Datarobot

Concepts

Agile

Authors (2):

Bipin Chadha

Sylvester Juwe

View More author details

Table of Contents (19) Chapters

Preface

1. Section 1: Foundations

2. Chapter 1: What Is DataRobot and Why You Need It? FREE CHAPTER

3. Chapter 2: Machine Learning Basics

4. Chapter 3: Understanding and Defining Business Problems

5. Section 2: Full ML Life Cycle with DataRobot: Concept to Value

6. Chapter 4: Preparing Data for DataRobot

7. Chapter 5: Exploratory Data Analysis with DataRobot

8. Chapter 6: Model Building with DataRobot

9. Chapter 7: Model Understanding and Explainability

10. Chapter 8: Model Scoring and Deployment

11. Section 3: Advanced Topics

12. Chapter 9: Forecasting and Time Series Modeling

13. Chapter 10: Recommender Systems

14. Chapter 11: Working with Geospatial Data, NLP, and Image Processing

15. Chapter 12: DataRobot Python API

16. Chapter 13: Model Governance and MLOps

17. Chapter 14: Conclusion

18. Other Books You May Enjoy

Addressing data science challenges with DataRobot

Now that you know what DataRobot offers, let's revisit the data science process and challenges to see how DataRobot helps in addressing these challenges and why this is a valuable tool in your toolkit.

Lack of good-quality data

While DataRobot cannot do much to address this challenge, it does offer some capabilities to handle data with quality problems:

Automatically highlights data quality problems.
Automated EDA and data visualization expose issues that could be missed.
Handles and imputes missing values.
Detection of data drift.

Explosion of data

While it is unlikely that the increase in the volume and variety will slow down any time soon, DataRobot offers several capabilities to address these challenges:

Support for SparkSQL enables the efficient pre-processing of large datasets.
Automatically handles categorical data encodings and selects appropriate model blueprints.
Automatically handles geospatial features, text features, and image features.

Shortage of experienced data scientists

This is a key challenge for most organizations and data science teams, and DataRobot is well positioned to address this challenge:

Provides capabilities that cover most of the data science process steps.
Significant automation of several routine tasks by providing pre-built blueprints encoded with best practices.
Experienced data scientists can build and deploy models much faster.
Data analysts or data scientists who are not very comfortable coding can utilize DataRobot capabilities without having to write a lot of code.
Experienced data scientists who are comfortable with coding can utilize the APIs to automatically build and deploy an order of magnitude more models than otherwise feasible without the support of other data engineering or IT staff.
Even experienced data scientists do not know all the possible algorithms and typically do not have the time to try out many of the combinations and build analysis visualizations and explanations for all models. DataRobot takes care of many of these tasks for them, enabling them to focus more time on understanding the problem and analyzing results.

Immature tools and environments

This is a key barrier to the productivity and effectiveness of any data science organization. DataRobot clearly addresses this key challenge by offering the following:

Ease of deployment of any model as a REST API.
Ease of use in developing multiple competing models and selecting the best ones without worrying about the underlying infrastructure, installation of compatible versions, and without coding and debugging. These tasks can take up a lot of time that would be better spent on understanding and solving the business problem.
DataRobot encodes many of the best practices into their development process so as to prevent mistakes. DataRobot automatically takes care of many small details that can be overlooked even by experienced data scientists, leading to flawed models or rework.
DataRobot provides automated documentation of models and modeling steps that could otherwise be glossed over or forgotten. This becomes valuable at a later time when a data scientist has to revisit an old model built by them or someone else.

Black box models

This is a key challenge that DataRobot has done extensive work on to provide methods to help make models more explainable, such as the following:

Automated generation of feature importance (using Shapley values and other methods) and partial dependence plots for models
Automated generation of explanations for specific predictions
Automated generation of simpler models that could be used to explain the complex models
Ability to create models that inherently more explainable such as Generalized Additive Models (GAMs)

Bias and fairness

Recently, DataRobot has added capabilities to help detect bias and fairness issues in models. This is no guarantee of a complete lack of bias, but it's a good starting point to ensure positive movement in this direction. Some of the capabilities added are listed here:

Specify protected features that need to be checked for bias.
Specify bias metrics that you want to use to check for fairness.
Evaluate your models using metrics for protected features.
Use of model explanations to investigate whether there is potential for unfairness.

While many people believe that with these automated tools, you no longer need data scientists, nothing could be further from the truth. It is, however, obvious that such tools will make data science teams a lot more valuable to their organizations by unlocking more value faster and by making these organizations more competitive. It is therefore likely that tools such as DataRobot will become increasingly commonplace and see widespread use.