Now, let's go through some commercial tools and platforms that are used for automated ML.
DataRobot
DataRobot is a proprietary platform for automated ML. As one of the leaders in the automated ML space, Data Robot claims to "automate the end-to-end process for building, deploying, and maintaining AI at scale". Data Robot's model repository contains open source as well as proprietary algorithms and approaches for data scientists, with a focus on business outcomes. Data Robot's offerings are available for both the cloud and on-premises implementations.
The platform can be accessed here: https://www.datarobot.com/platform/.
Google Cloud AutoML
Integrated in the Google Cloud Compute platform, the Google Cloud AutoML offering aims to help train high-quality custom ML models with minimal effort and ML expertise. This offering provides AutoML Vision, AutoML Video Intelligence, AutoML Natural Language, AutoML Translation, and AutoML Tables for structured data analysis. We will discuss this Google offering in more detail in Chapter 8, Machine Learning with Google Cloud Platform, and Chapter 9, Automated Machine Learning with GCP Cloud AutoML of this book.
Google Cloud AutoML can be accessed at https://cloud.google.com/automl.
Amazon SageMaker Autopilot
AWS offers a wide variety of capabilities around AI and ML. SageMaker Autopilot is among one of these offerings and helps to "automatically build, train, and tune models" as part of the AWS ecosystem. SageMaker Autopilot provides an end-to-end automated ML life cycle that includes automatic feature engineering, model and algorithm selection, model tuning, deployment, and ranking based on performance. We will discuss AWS SageMaker Autopilot in Chapter 6, Machine Learning with Amazon Web Services, and Chapter 7, Doing Automated Machine Learning with Amazon SageMaker Autopilot.
Amazon SageMaker Autopilot can be accessed at https://aws.amazon.com/sagemaker/autopilot/.
Azure Automated ML
Microsoft Azure provides automated ML capabilities to help data scientists build ML models with speed and at scale. The platform offers automated feature engineering capabilities such as missing value imputation, transformations and encodings, drop ping high cardinality, and no variance features. Azure's automated ML also supports time series forecasting, algorithm selection, hyperparameter tunning, guardrails to keep model bias in check, and a model leaderboard for ranking and scoring. We will discuss the Azure ML and AutoML offerings in Chapter 4, Getting Started with Azure Machine Learning, and Chapter 5, Automated Machine Learning with Microsoft Azure.
Azure's automated ML offering can be accessed at https://azure.microsoft.com/en-us/services/machine-learning/automatedml/.
H2O Driverless AI
H2O's open source offerings were discussed earlier in the Open source platforms and books section. The commercial offering of H2O Driverless AI is an automated ML platform that addresses the needs of feature engineering, architecture search, and pipeline generation. The "bring your own recipe" feature is unique (even though it's now being adapted by other vendors) and is used to integrate custom algorithms. The commercial product has extensive capabilities and a feature-rich user interface for data scientists to get up to speed.
H2O Driverless AI can be accessed at https://www.h2o.ai/products/h2o-driverless-ai/.
Other notable frameworks and tools in this space include Autoxgboost, RapidMiner Auto Model, BigML, MLJar, MLBox, DATAIKU, and Salesforce Einstein (powered by Transmogrif AI). The links to their toolkits can be found in this book's Appendix. The following table is from Mark Lin's Awesome AutoML repository and outlines some of the most important automated machine learning toolkits, along with their corresponding links:
Figure 1.3 – Automated ML projects from Awesome-AutoML-Papers by Mark Lin
The classification type column specifies whether the library supports Network Architecture Search (NAS), Hyperparameter Optimization (HPO), and Automated Feature Engineering (AutoFE).
The future of automated ML
As the industry makes significant investments in the area surrounding automated ML, it is poised to become an important part of our enterprise data science workflows, if it isn't already. Serving as a valuable assistant, this apprentice will help data scientists and knowledge workers focus on the business problem and take care of any thing unwieldy and trivial. Even though the current focus is limited to automated feature engineering, architecture search, and hyperparameter optimization, we will also see that meta-learning techniques will be introduced in other areas to help automate this automation process.
Due to the increasing demand of democratization of AI and ML, we will see automated ML become mainstream in the industry – with all the major tools and hyperscaler platforms providing it as an inherent part of their ML offerings. This next generation of automated ML equipped tools will allow us to perform data preparation, domain customized feature engineering, model selection and counterfactual analysis, operationalization, explainability, monitoring, and create feedback loops. This will make it easier for us to focus on what's important in the business, including business insights and impact.
The automated ML challenges and limitations
As we mentioned earlier, data scientists aren't getting replaced, and automated ML is not a job killer – for now. The job of data scientists will evolve as the toolsets and their functions continue to change.
The reasons for this are twofold. Firstly, automated ML does not automate data science as a discipline. It is definitely a time saver for performing automated feature engineering, architecture search, hyperparameter optimization, or running multiple experiments in parallel. However, there are various other essential parts of the data science life cycle that cannot be easily automated, thus providing the current state of automated ML.
The second key reason is that being a data scientist is not a homogenous role – the competencies and responsibilities related to it vary across the industry and organizations. In lieu of democratizing data science with automated ML, the so-called junior data scientists will gain assistance from automated feature engineering capabilities, and this will speed up their data munging and wrangling practices. Meanwhile, senior engineers will have more time to focus on improving their business outcomes by designing better KPI metrices and enhancing the model's performance. As you can see, this will help all tiers of data science practitioners gain familiarity with the business domain and explore any cross-cutting concerns. Senior data scientists also have the responsibility of monitoring model and data quality and drift, as well as maintaining versioning, auditability, governance, lineage, and other MLOps (Machine Learning Operations) cross-cutting concerns.
Enabling the explainability and transparency of models to address any underlying bias is also a critical component for regulated industries across the world. Due to its highly subjective nature, there is limited functionality to address this automatically in the current toolsets; this is where a socially aware data scientist can provide a tremendous amount of value to stop the perpetuation of algorithmic bias.
A Getting Started guide for enterprises
Congratulations! You have almost made it to the end of the first chapter without dozing off – kudos! Now, you must be wondering: this automated ML thing sounds rad, but how do I go about using it in my company? Here are some pointers.
First, read the rest of this book to familiarize yourself with the concepts, technology, tools, and platforms. It is important to understand the landscape and understand that automated ML is a tool in your data science toolkit – it does not replace your data scientists.
Second, use automated ML as a democratization tool across the enterprise when you're dealing with analytics. Build a training plan for your team to become familiar with the tools, provide guidance, and chart a path to automation in data science workflows.
Lastly, due to the large churn in the feature sets, start with a smaller footprint, probably with an open source stack, before you commit to an enterprise framework. Scaling up this way will help you understand your own automation needs and give you time to do comparison shopping.