Addressing data science challenges with DataRobot
Now that you know what DataRobot offers, let's revisit the data science process and challenges to see how DataRobot helps in addressing these challenges and why this is a valuable tool in your toolkit.
Lack of good-quality data
While DataRobot cannot do much to address this challenge, it does offer some capabilities to handle data with quality problems:
- Automatically highlights data quality problems.
- Automated EDA and data visualization expose issues that could be missed.
- Handles and imputes missing values.
- Detection of data drift.
Explosion of data
While it is unlikely that the increase in the volume and variety will slow down any time soon, DataRobot offers several capabilities to address these challenges:
- Support for SparkSQL enables the efficient pre-processing of large datasets.
- Automatically handles categorical data encodings and selects appropriate model blueprints.
- Automatically handles geospatial features, text features, and image features.
Shortage of experienced data scientists
This is a key challenge for most organizations and data science teams, and DataRobot is well positioned to address this challenge:
- Provides capabilities that cover most of the data science process steps.
- Significant automation of several routine tasks by providing pre-built blueprints encoded with best practices.
- Experienced data scientists can build and deploy models much faster.
- Data analysts or data scientists who are not very comfortable coding can utilize DataRobot capabilities without having to write a lot of code.
- Experienced data scientists who are comfortable with coding can utilize the APIs to automatically build and deploy an order of magnitude more models than otherwise feasible without the support of other data engineering or IT staff.
- Even experienced data scientists do not know all the possible algorithms and typically do not have the time to try out many of the combinations and build analysis visualizations and explanations for all models. DataRobot takes care of many of these tasks for them, enabling them to focus more time on understanding the problem and analyzing results.
Immature tools and environments
This is a key barrier to the productivity and effectiveness of any data science organization. DataRobot clearly addresses this key challenge by offering the following:
- Ease of deployment of any model as a REST API.
- Ease of use in developing multiple competing models and selecting the best ones without worrying about the underlying infrastructure, installation of compatible versions, and without coding and debugging. These tasks can take up a lot of time that would be better spent on understanding and solving the business problem.
- DataRobot encodes many of the best practices into their development process so as to prevent mistakes. DataRobot automatically takes care of many small details that can be overlooked even by experienced data scientists, leading to flawed models or rework.
- DataRobot provides automated documentation of models and modeling steps that could otherwise be glossed over or forgotten. This becomes valuable at a later time when a data scientist has to revisit an old model built by them or someone else.
Black box models
This is a key challenge that DataRobot has done extensive work on to provide methods to help make models more explainable, such as the following:
- Automated generation of feature importance (using Shapley values and other methods) and partial dependence plots for models
- Automated generation of explanations for specific predictions
- Automated generation of simpler models that could be used to explain the complex models
- Ability to create models that inherently more explainable such as Generalized Additive Models (GAMs)
Bias and fairness
Recently, DataRobot has added capabilities to help detect bias and fairness issues in models. This is no guarantee of a complete lack of bias, but it's a good starting point to ensure positive movement in this direction. Some of the capabilities added are listed here:
- Specify protected features that need to be checked for bias.
- Specify bias metrics that you want to use to check for fairness.
- Evaluate your models using metrics for protected features.
- Use of model explanations to investigate whether there is potential for unfairness.
While many people believe that with these automated tools, you no longer need data scientists, nothing could be further from the truth. It is, however, obvious that such tools will make data science teams a lot more valuable to their organizations by unlocking more value faster and by making these organizations more competitive. It is therefore likely that tools such as DataRobot will become increasingly commonplace and see widespread use.