Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases now! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Agile Machine Learning with DataRobot

You're reading from   Agile Machine Learning with DataRobot Automate each step of the machine learning life cycle, from understanding problems to delivering value

Arrow left icon
Product type Paperback
Published in Dec 2021
Publisher Packt
ISBN-13 9781801076807
Length 344 pages
Edition 1st Edition
Languages
Concepts
Arrow right icon
Authors (2):
Arrow left icon
Bipin Chadha Bipin Chadha
Author Profile Icon Bipin Chadha
Bipin Chadha
Sylvester Juwe Sylvester Juwe
Author Profile Icon Sylvester Juwe
Sylvester Juwe
Arrow right icon
View More author details
Toc

Table of Contents (19) Chapters Close

Preface 1. Section 1: Foundations
2. Chapter 1: What Is DataRobot and Why You Need It? FREE CHAPTER 3. Chapter 2: Machine Learning Basics 4. Chapter 3: Understanding and Defining Business Problems 5. Section 2: Full ML Life Cycle with DataRobot: Concept to Value
6. Chapter 4: Preparing Data for DataRobot 7. Chapter 5: Exploratory Data Analysis with DataRobot 8. Chapter 6: Model Building with DataRobot 9. Chapter 7: Model Understanding and Explainability 10. Chapter 8: Model Scoring and Deployment 11. Section 3: Advanced Topics
12. Chapter 9: Forecasting and Time Series Modeling 13. Chapter 10: Recommender Systems 14. Chapter 11: Working with Geospatial Data, NLP, and Image Processing 15. Chapter 12: DataRobot Python API 16. Chapter 13: Model Governance and MLOps 17. Chapter 14: Conclusion 18. Other Books You May Enjoy

Addressing data science challenges with DataRobot

Now that you know what DataRobot offers, let's revisit the data science process and challenges to see how DataRobot helps in addressing these challenges and why this is a valuable tool in your toolkit.

Lack of good-quality data

While DataRobot cannot do much to address this challenge, it does offer some capabilities to handle data with quality problems:

  • Automatically highlights data quality problems.
  • Automated EDA and data visualization expose issues that could be missed.
  • Handles and imputes missing values.
  • Detection of data drift.

Explosion of data

While it is unlikely that the increase in the volume and variety will slow down any time soon, DataRobot offers several capabilities to address these challenges:

  • Support for SparkSQL enables the efficient pre-processing of large datasets.
  • Automatically handles categorical data encodings and selects appropriate model blueprints.
  • Automatically handles geospatial features, text features, and image features.

Shortage of experienced data scientists

This is a key challenge for most organizations and data science teams, and DataRobot is well positioned to address this challenge:

  • Provides capabilities that cover most of the data science process steps.
  • Significant automation of several routine tasks by providing pre-built blueprints encoded with best practices.
  • Experienced data scientists can build and deploy models much faster.
  • Data analysts or data scientists who are not very comfortable coding can utilize DataRobot capabilities without having to write a lot of code.
  • Experienced data scientists who are comfortable with coding can utilize the APIs to automatically build and deploy an order of magnitude more models than otherwise feasible without the support of other data engineering or IT staff.
  • Even experienced data scientists do not know all the possible algorithms and typically do not have the time to try out many of the combinations and build analysis visualizations and explanations for all models. DataRobot takes care of many of these tasks for them, enabling them to focus more time on understanding the problem and analyzing results.

Immature tools and environments

This is a key barrier to the productivity and effectiveness of any data science organization. DataRobot clearly addresses this key challenge by offering the following:

  • Ease of deployment of any model as a REST API.
  • Ease of use in developing multiple competing models and selecting the best ones without worrying about the underlying infrastructure, installation of compatible versions, and without coding and debugging. These tasks can take up a lot of time that would be better spent on understanding and solving the business problem.
  • DataRobot encodes many of the best practices into their development process so as to prevent mistakes. DataRobot automatically takes care of many small details that can be overlooked even by experienced data scientists, leading to flawed models or rework.
  • DataRobot provides automated documentation of models and modeling steps that could otherwise be glossed over or forgotten. This becomes valuable at a later time when a data scientist has to revisit an old model built by them or someone else.

Black box models

This is a key challenge that DataRobot has done extensive work on to provide methods to help make models more explainable, such as the following:

  • Automated generation of feature importance (using Shapley values and other methods) and partial dependence plots for models
  • Automated generation of explanations for specific predictions
  • Automated generation of simpler models that could be used to explain the complex models
  • Ability to create models that inherently more explainable such as Generalized Additive Models (GAMs)

Bias and fairness

Recently, DataRobot has added capabilities to help detect bias and fairness issues in models. This is no guarantee of a complete lack of bias, but it's a good starting point to ensure positive movement in this direction. Some of the capabilities added are listed here:

  • Specify protected features that need to be checked for bias.
  • Specify bias metrics that you want to use to check for fairness.
  • Evaluate your models using metrics for protected features.
  • Use of model explanations to investigate whether there is potential for unfairness.

While many people believe that with these automated tools, you no longer need data scientists, nothing could be further from the truth. It is, however, obvious that such tools will make data science teams a lot more valuable to their organizations by unlocking more value faster and by making these organizations more competitive. It is therefore likely that tools such as DataRobot will become increasingly commonplace and see widespread use.

You have been reading a chapter from
Agile Machine Learning with DataRobot
Published in: Dec 2021
Publisher: Packt
ISBN-13: 9781801076807
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime