Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon
Automated Machine Learning
Automated Machine Learning

Automated Machine Learning: Hyperparameter optimization, neural architecture search, and algorithm selection with cloud platforms

Arrow left icon
Profile Icon Adnan Masood
Arrow right icon
$19.99 per month
Full star icon Full star icon Full star icon Full star icon Half star icon 4.5 (15 Ratings)
Paperback Feb 2021 312 pages 1st Edition
eBook
$9.99 $35.99
Paperback
$48.99
Subscription
Free Trial
Renews at $19.99p/m
Arrow left icon
Profile Icon Adnan Masood
Arrow right icon
$19.99 per month
Full star icon Full star icon Full star icon Full star icon Half star icon 4.5 (15 Ratings)
Paperback Feb 2021 312 pages 1st Edition
eBook
$9.99 $35.99
Paperback
$48.99
Subscription
Free Trial
Renews at $19.99p/m
eBook
$9.99 $35.99
Paperback
$48.99
Subscription
Free Trial
Renews at $19.99p/m

What do you get with a Packt Subscription?

Free for first 7 days. $19.99 p/m after that. Cancel any time!
Product feature icon Unlimited ad-free access to the largest independent learning library in tech. Access this title and thousands more!
Product feature icon 50+ new titles added per month, including many first-to-market concepts and exclusive early access to books as they are being written.
Product feature icon Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.
Product feature icon Thousands of reference materials covering every tech concept you need to stay up to date.
Subscribe now
View plans & pricing
Table of content icon View table of contents Preview book icon Preview Book

Automated Machine Learning

Chapter 1: A Lap around Automated Machine Learning

"All models are wrong, but some are useful."

– George Edward Pelham Box FRS

"One of the holy grails of machine learning is to automate more and more of the feature engineering process."

– Pedro Domingos, A Few Useful Things to Know about Machine Learning

This chapter will provide an overview of the concepts, tools, and technologies surrounding automated Machine Learning (ML). This introduction hopes to provide both a solid overview for novices and serve as a reference for experienced ML practitioners. We will start by introducing the ML development life cycle while navigating through the product ecosystem and the data science problems it addresses, before looking at feature selection, neural architecture search, and hyperparameter optimization.

It's very plausible that you are reading this book on an e-reader that's connected to a website that recommended this manuscript based on your reading interests. We live in a world today where your digital breadcrumbs give telltale signs of not only your reading interests, but where you like to eat, which friend you like most, where you will shop next, whether you will show up to your next appointment, and who you would vote for. In this age of big data, this raw data becomes information that, in turn, helps build knowledge and insights into so-called wisdom.

Artificial Intelligence (AI) and its underlying implementations of ML and deep learning help us not only find the metaphorical needle in the haystack, but also to see the underlying trends, seasonality, and patterns in these large data streams to make better predictions. In this book, we will cover one of the key emerging technologies in AI and ML; that is, automated ML, or AutoML for short.

In this chapter, we will cover the following topics:

  • The ML development life cycle
  • Automated ML
  • How automated ML works
  • Democratization of data science
  • Debunking automated ML myths
  • Automated ML ecosystem (open source and commercial)
  • Automated ML challenges and limitations

Let's get started!

The ML development life cycle

Before introducing you to automated ML, we should first define how we operationalize and scale ML experiments into production. To go beyond Hello-World apps and works-on-my-machine-in-my-Jupyter-notebook kinds of projects, enterprises need to adapt a robust, reliable, and repeatable model development and deployment process. Just as in a software development life cycle (SDLC), the ML or data science life cycle is also a multi-stage, iterative process.

The life cycle includes several steps – the process of problem definition and analysis, building the hypothesis (unless you are doing exploratory data analysis), selecting business outcome metrices, exploring and preparing data, building and creating ML models, training those ML models, evaluating and deploying them, and maintaining the feedback loop:

Figure 1.1 – Team data science process

Figure 1.1 – Team data science process

A successful data science team has the discipline to prepare the problem statement and hypothesis, preprocess the data, select the appropriate features from the data based on the input of the Subject-Matter Expert (SME) and the right model family, optimize model hyperparameters, review outcomes and the resulting metrics, and finally fine-tune the models. If this sounds like a lot, remember that it is an iterative process where the data scientist also has to ensure that the data, model versioning, and drift are being addressed. They must also put guardrails in place to guarantee the model's performance is being monitored. Just to make this even more interesting, there are also frequent champion challenger and A/B experimentations happening in production – may the best model win.

In such an intricate and multifaceted environment, data scientists can use all the help they can get. Automated ML extends a helping hand with the promise to take care of the mundane, the repetitive, and the intellectually less efficient tasks so that the data scientists can focus on the important stuff.

Automated ML

"How many members of a certain demographic group does it take to perform a specified task?"

"A finite number: one to perform the task and the remainder to act in a manner stereotypical of the group in question." <insert your light bulb joke here>

This is meta humor – the finest type of humor for ensuing hilarity for those who are quantitatively inclined. Similarly, automated ML is a class of meta learning, also known as learning to learn – the idea that you can apply the automation principles to themselves to make the process of gaining insights even faster and more elegant.

Automated ML is the approach and underlying technology of applying certain automation techniques to accelerate the model's development life cycle. Automated ML enables citizen data scientists and domain experts to train ML models, and helps them build optimal solutions to ML problems. It provides a higher level of abstraction for finding out what the best model is, or an ensemble of models suitable for a specific problem. It assists data scientists by automating the mundane and repetitive tasks of feature engineering, including architecture search and hyperparameter optimization. The following diagram represents the ecosystem of automated ML:

Figure 1.2 – Automated ML ecosystem

Figure 1.2 – Automated ML ecosystem

These three key areas – feature engineering, architecture search, and hyperparameter optimization – hold the most promise for the democratization of AI and ML. Some automated feature engineering techniques that are finding domain-specific usable features in datasets include expand/reduce, hierarchically organizing transformations, meta learning, and reinforcement learning. For architectural search (also known as neural architecture search), evolutionary algorithms, local search, meta learning, reinforcement learning, transfer learning, network morphism, and continuous optimization are employed.

Last, but not least, we have hyperparameter optimization, which is the art and science of finding the right type of parameters outside the model. A variety of techniques are used here, including Bayesian optimization, evolutionary algorithms, Lipchitz functions, local search, meta learning, particle swarm optimization, random search, and transfer learning, to name a few.

In the next section, we will provide a detailed overview of these three key areas of automated ML. You will see some examples of them, alongside code, in the upcoming chapters. Now, let's discuss how automated ML really works in detail by covering feature engineering, architecture search, and hyperparameter optimization.

How automated ML works

ML techniques work great when it comes to finding patterns in large datasets. Today, we use these techniques for anomaly detection, customer segmentation, customer churn analysis, demand forecasting, predictive maintenance, and pricing optimization, among hundreds of other use cases.

A typical ML life cycle is comprised of data collection, data wrangling, pipeline management, model retraining, and model deployment, during which data wrangling is typically the most time-consuming task.

Extracting meaningful features out of data, and then using them to build a model while finding the right algorithm and tuning the parameters, is also a very time-consuming process. Can we automate this process using the very thing we are trying to build here (meta enough?); that is, should we automate ML? Well, that is how this all started – with someone attempting to print a 3D printer using a 3D printer.

A typical data science workflow starts with a business problem (hopefully!), and it is used to either prove a hypothesis or to discover new patterns in the existing data. It requires data; the need to clean and preprocess the data, which takes an awfully large amount of time – almost as much as 80% of your total time; and "data munging" or wrangling, which includes cleaning, de-duplication, outlier analysis and removal, transforming, mapping, structuring, and enriching. Essentially, we're taming this unwieldy vivacious raw real-world data and putting it in a tame desired format for analysis and modeling so that we can gain meaningful insights from it.

Next, we must select and engineer features, which means figuring out what features are useful, and then brainstorming and working with SMEs on the importance and validity of these features. Validating how these features would work with your model, the fitness from both a technical and business perspective, and improving these features as needed is also a critical part of the feature engineering process. The feedback loop to the SME is often very important, albeit being the least emphasized part of the feature engineering pipeline. The transparency of models stems from clear features – if features such as race or gender give higher accuracy regarding your loan repayment propensity model, this does not mean it's a good idea use them. In fact, an SME would tell you – if your conscious mind hasn't – that it's a terrible idea and that you should look for more meaningful and less sexist, racist, and xenophobic features. We will discuss this further in Chapter 10, AutoML in the Enterprise, when we discuss operationalization.

Even though the task of "selecting a model family" sounds like a reality show, that is what data scientists and ML engineers do as part of their day-to-day job. Model selection is the task of picking the right model that best describes the data at hand. This involves selecting a ML model from a set of candidate models. Automated ML can give you a helping hand with this.

Hyperparameters

You will hear about hyperparameters a lot, so let's make sure you understand what they are.

Each model has its own internal and external parameters. Internal parameters (also known as model parameters, or just parameters) are the ones intrinsic to the model, such as its weight and predictor matrix, while external parameters or hyperparameters are "outside" the model itself, such as the learning rate and its number of iterations. An intuitive example can be derived from k-means, a well-understood unsupervised clustering algorithm known for its simplicity.

The k in k-means stands for the number of clusters required, and epochs (pronounced epics, as in Doctor Who is an epic show!) are used to specify the number of passes that are done over the training data. Both of these are examples of hyperparameters – that is, the parameters that are not intrinsic to the model itself. Similarly, the learning rate for training a neural network, C and sigma for support vector machines, the k number of leaves or depth of a tree, the latent factors in a matrix factorization, and the number of hidden layers in a deep neural network are all examples of hyperparameters.

Selecting the right hyperparameters has been called tuning your instrument, which is where the magic happens. In ML tribal folklore, these elusive numbers have been brandished as "nuisance parameters", to the point where proverbial statements such as "tuning is more of an art than a science" and "tuning models is like black magic" tend to discourage newcomers in the industry. Automated ML is here to change this perception by helping you choose the right hyperparameters; more on this later. Automated ML enables citizen data scientists to build, train, and deploy ML models, thus possibly disrupting the status quo.

Important note

Some consider the term "citizen data scientists" as a euphuism for non-experts, but SME and people who are curious about analytics are some of the most important people – and don't let anyone tell you otherwise.

In conclusion, from building the correct ensembles of models to preprocessing the data, selecting the right features and model family, choosing and optimizing model hyperparameters, and evaluating the results, automated ML offers algorithmic solutions that can programmatically address these challenges.

The need for automated ML

At the time of writing, Open AI's GPT-3 model has recently been announced, and it has an incredible 175 billion parameters. Due to this ever-increasing model complexity, which includes big data and an exponentially increasing number of features, we now have a necessity to not only be able to tune these parameters, but also have sophisticated, repeatable procedures in place to tweak these proverbial knobs so that they can be adjusted. This complexity makes it less accessible for citizen data scientists, business subject matter experts, and domain experts – which might sound like job security, but it is not good for business, nor for the long-term success of the field.

Also, this isn't just about the hyperparameters, but the entire pipeline and the reproducibility of the results becoming harder as the model's complexity grows, which curtails AI democratization.

Democratization of data science

To nobody's surprise, data scientists are in high demand! As a LinkedIn Workforce Report found in August 2018, there were more than 151,000 data scientist jobs going unfilled across the US (https://economicgraph.linkedin.com/resources/linkedin-workforce-report-august-2018). Due to this disparity in supply and demand, the notion of democratization of AI, which is enabling people who are not formally trained in math, statistics, computer science, and related quantitative fields to design, develop, and use predictive models, has become quite popular. There are arguments on both sides regarding whether an SME, a domain SME, a business executive, or a program manager can effectively work as a citizen data scientist – which I consider to be a layer of abstraction argument. For businesses to gain meaningful actionable insights in a timely manner, there is no other way than to accelerate the process of raw data to insight, and insights to action. It is quite evident to anyone who has served in the analytics trenches. This means that no citizen data scientists are left behind.

As disclaimers and caveats go, like everything else, automatic ML is not the proverbial silver bullet. However, automated methods for model selection and hyperparameter optimization bear the promise of enabling non-experts and citizen data scientists to train, test, and deploy high quality ML models. The tooling around automated ML is shaping up and hopefully, this gap will be reduced, allowing for increased participation. Now, let's review some of the myths surrounding automated ML and debunk them, MythBusters style!

Debunking automated ML myths

Much like the moon landing, when it comes to automated ML, there are more than a few conspiracy theories and myths surrounding it. Let's take a look at a few that have been debunked.

Myth #1 – The end of data scientists

One of the most frequently asked questions around automated ML is, "Will automated ML be a job killer for data scientists?"

The short answer is, not anytime soon – and the long answer, as always, is more nuanced and boring.

The data science life cycle, as we discussed previously, has several moving parts where domain expertise and subject matter insights are critical. The data scientists collaborate with businesses to build a hypothesis, analyze the results, and decide on any actionable insights that may create business impact. The act of automating mundane and repeatable tasks in data science, does not take away from the cognitively challenging task of discovering insights. If anything, instead of spending hours sifting through data and cleaning up features, it frees up data scientists to learn more about the underlying business. A large variety of real-world data science applications need dedicated human supervision, as well as the steady gaze of domain experts to ensure the fine-grained actions that come out of these insights reflect the desired outcome.

One of the proposed approaches, A Human-in-the-Loop (HITL) Perspective on AutoML: Milestones and the Road Ahead by Doris Jung-Lin Lee et al., builds upon the notion of keeping humans in the loop. HITL suggests three different level of automation in data science workflows: user-driven, cruise control, and autopilot. As you progress through the maturity curve and the confidence of specific models increases, the user-driven flows move to cruise control and eventually to the autopilot stage. By leveraging different areas of expertise by building a talent pool, automated ML can help in multiple stages of the data science life cycle by engaging humans.

Myth #2 – Automated ML can only solve toy problems

This is a frequent argument from the skeptics of automated ML – that it can only be used to solve well-defined, controlled toy problems in data science and does not bode well for any real-world scenario.

The reality is quite the contrary – but I think the confusion arises from an incorrect assumption that we can just take a dataset, throw it to an automated ML model, and we will get meaningful insights. If we were to believe the hype around automated ML, then it should be able to look at messy data, perform a magical cleanup, figure out all the important features (including target variables), find the right model, tune its hyperparameters, and voila – it's built a magical pipeline!

Even though it does sound absurd when spoken out loud, this is exactly what you see in carefully crafted automated ML product demos. Then, there's the hype cycle, which has the opposite effect of diminishing the real value of automated ML offerings. The technical approaches powering automated ML are robust, and the academic rigor that's put into bringing these theories and techniques to life is like any other area of AI and ML.

In future chapters, we will look at several examples of hyperscalar platforms that benefit from automated ML, including – but not limited to – Google Cloud Platform, AWS, and Azure. These testimonials lead us to believe that real-world automated ML is not limited to eking out better accuracy in Kaggle championships, but rather poised to disrupt the industry in a big way.

Automated ML ecosystem

It almost feels redundant to point out that automated ML is a rapidly growing field; it's far from being commoditized – existing frameworks are constantly being evolved and new offerings and platforms are becoming mainstream. In the upcoming chapters, we will discuss some of these frameworks and libraries in detail. For now, we will provide you with a breadth-first introduction to get you acquainted with the automated ML ecosystem before we do a deep dive.

Open source platforms and tools

In this section, we will briefly review some of the open source automated ML platforms and tools that are available. We will deep dive into some of these platforms in Chapter 3, Automated Machine Learning with Open Source Tools and Libraries.

Microsoft NNI

Microsoft Neural Network Intelligence (NNI) is an open source platform that addresses the three key areas of any automated ML life cycle – automated feature engineering, architectural search (also referred to as neural architectural search or NAS), and hyperparameter tunning (HPI). The toolkit also offers model compression features and operationalization via KubeFlow, Azure ML, DL Workspace (DLTS), and Kubernetes over AWS.

The toolkit is available on GitHub to be downloaded: https://github.com/microsoft/nni.

auto-sklearn

Scikit-learn (also known as sklearn) is a popular ML library for Python development. As part of this ecosystem and based on Efficient and Robust Automated ML by Feurer et al., auto-sklearn is an automated ML toolkit that performs algorithm selection and hyperparameter tuning using Bayesian optimization, meta-learning, and ensemble construction.

The toolkit is available on GitHub to be downloaded: github.com/automl/auto-sklearn.

Auto-Weka

Weka, short for Waikato Environment for Knowledge Analysis, is an open source ML library that provides a collection of visualization tools and algorithms for data analysis and predictive modeling. Auto-Weka is similar to auto-sklearn but is built on top of Weka and implements the approaches described in the paper for model selection, hyperparameter optimization, and more.

The developers describe Auto-WEKA as going beyond selecting a learning algorithm and setting its hyperparameters in isolation. Instead, it implements a fully automated approach. The author's intent is for Auto-WEKA "to help non-expert users to more effectively identify ML algorithms" – that is, democratization for SMEs – via "hyperparameter settings appropriate to their applications".

The toolkit is available on GitHub to be downloaded: github.com/automl/autoweka.

Auto-Keras

Keras is one of the most widely used deep learning frameworks and is an integral part of the TensorFlow 2.0 ecosystem. Auto-Keras, based on the paper by Jin et al., proposes that it is "a novel method for efficient neural architecture search with network morphism, enabling Bayesian optimization". This helps the neural architectural search "by designing a neural network kernel and algorithm for optimizing acquisition functions in a tree-structured space". Auto-Keras is the implementation of this deep learning architecture search via Bayesian optimization.

The toolkit is available on GitHub to be downloaded: github.com/jhfjhfj1/autokeras.

TPOT

The Tree-based Pipeline Optimization Tool, or TPOT for short (nice acronym, eh!), is a product of University of Pennsylvania, Computational Genetics Lab. TPOT is an automated ML tool written in Python. It helps build and optimize ML pipelines with genetic programming. Built on top of scikit-learn, TPOT helps automate feature selection, preprocessing, construction, model selection, and parameter optimization by "exploring thousands of possible pipelines to find the best one". It is just one of the many toolkits with a small learning curve.

The toolkit is available on GitHub to be downloaded: github.com/EpistasisLab/tpot.

Ludwig – a code-free AutoML toolbox

Uber's automated ML tool, Ludwig, is an open source deep learning toolbox used for experimentation, testing, and training ML models. Built on top of TensorFlow, Ludwig enables users to create model baselines and perform automated ML-style experiments with different network architectures and models. In its latest release (at the time of writing), Ludwig now integrates with CometML and supports BERT text encoders.

The toolkit is available on GitHub to be downloaded: https://github.com/uber/ludwig.

AutoGluon – an AutoML toolkit for deep learning

From AWS Labs, with the goal of democratization of ML in mind, AutoGluon has been developed to enable "easy-to-use and easy-to-extend AutoML with a focus on deep learning and real-world applications spanning image, text, or tabular data". AutoGluon, an integral part of AWS's automated ML strategy, enables both junior and seasoned data scientists to build deep learning models and end-to-end solutions with ease. Like other automated ML toolkits, AutoGluon offers network architecture search, model selection, and custom model improvements.

The toolkit is available on GitHub to be downloaded: https://github.com/awslabs/autogluon.

Featuretools

Featuretools is an excellent Python framework that helps with automated feature engineering by using deep feature synthesis. Feature engineering is a tough problem due to its very nuanced nature. However, this open source toolkit, with its excellent timestamp handling and reusable feature primitives, provides an excellent framework you can use to build and extract a combination of features and look at what impact they have.

The toolkit is available on GitHub to be downloaded: https://github.com/FeatureLabs/featuretools/.

H2O AutoML

H2O's AutoML provides an open source version of H2O's commercial product, with APIs in R, Python, and Scala. This is an open source, distributed (multi-core and multi-node) implementation for automated ML algorithms and supports basic data preparation via a mix of grid and random search.

The toolkit is available on GitHub to be downloaded: github.com/h2oai/h2o-3.

Commercial tools and platforms

Now, let's go through some commercial tools and platforms that are used for automated ML.

DataRobot

DataRobot is a proprietary platform for automated ML. As one of the leaders in the automated ML space, Data Robot claims to "automate the end-to-end process for building, deploying, and maintaining AI at scale". Data Robot's model repository contains open source as well as proprietary algorithms and approaches for data scientists, with a focus on business outcomes. Data Robot's offerings are available for both the cloud and on-premises implementations.

The platform can be accessed here: https://www.datarobot.com/platform/.

Google Cloud AutoML

Integrated in the Google Cloud Compute platform, the Google Cloud AutoML offering aims to help train high-quality custom ML models with minimal effort and ML expertise. This offering provides AutoML Vision, AutoML Video Intelligence, AutoML Natural Language, AutoML Translation, and AutoML Tables for structured data analysis. We will discuss this Google offering in more detail in Chapter 8, Machine Learning with Google Cloud Platform, and Chapter 9, Automated Machine Learning with GCP Cloud AutoML of this book.

Google Cloud AutoML can be accessed at https://cloud.google.com/automl.

Amazon SageMaker Autopilot

AWS offers a wide variety of capabilities around AI and ML. SageMaker Autopilot is among one of these offerings and helps to "automatically build, train, and tune models" as part of the AWS ecosystem. SageMaker Autopilot provides an end-to-end automated ML life cycle that includes automatic feature engineering, model and algorithm selection, model tuning, deployment, and ranking based on performance. We will discuss AWS SageMaker Autopilot in Chapter 6, Machine Learning with Amazon Web Services, and Chapter 7, Doing Automated Machine Learning with Amazon SageMaker Autopilot.

Amazon SageMaker Autopilot can be accessed at https://aws.amazon.com/sagemaker/autopilot/.

Azure Automated ML

Microsoft Azure provides automated ML capabilities to help data scientists build ML models with speed and at scale. The platform offers automated feature engineering capabilities such as missing value imputation, transformations and encodings, drop ping high cardinality, and no variance features. Azure's automated ML also supports time series forecasting, algorithm selection, hyperparameter tunning, guardrails to keep model bias in check, and a model leaderboard for ranking and scoring. We will discuss the Azure ML and AutoML offerings in Chapter 4, Getting Started with Azure Machine Learning, and Chapter 5, Automated Machine Learning with Microsoft Azure.

Azure's automated ML offering can be accessed at https://azure.microsoft.com/en-us/services/machine-learning/automatedml/.

H2O Driverless AI

H2O's open source offerings were discussed earlier in the Open source platforms and books section. The commercial offering of H2O Driverless AI is an automated ML platform that addresses the needs of feature engineering, architecture search, and pipeline generation. The "bring your own recipe" feature is unique (even though it's now being adapted by other vendors) and is used to integrate custom algorithms. The commercial product has extensive capabilities and a feature-rich user interface for data scientists to get up to speed.

H2O Driverless AI can be accessed at https://www.h2o.ai/products/h2o-driverless-ai/.

Other notable frameworks and tools in this space include Autoxgboost, RapidMiner Auto Model, BigML, MLJar, MLBox, DATAIKU, and Salesforce Einstein (powered by Transmogrif AI). The links to their toolkits can be found in this book's Appendix. The following table is from Mark Lin's Awesome AutoML repository and outlines some of the most important automated machine learning toolkits, along with their corresponding links:

Figure 1.3 – Automated ML projects from Awesome-AutoML-Papers by Mark Lin

Figure 1.3 – Automated ML projects from Awesome-AutoML-Papers by Mark Lin

The classification type column specifies whether the library supports Network Architecture Search (NAS), Hyperparameter Optimization (HPO), and Automated Feature Engineering (AutoFE).

The future of automated ML

As the industry makes significant investments in the area surrounding automated ML, it is poised to become an important part of our enterprise data science workflows, if it isn't already. Serving as a valuable assistant, this apprentice will help data scientists and knowledge workers focus on the business problem and take care of any thing unwieldy and trivial. Even though the current focus is limited to automated feature engineering, architecture search, and hyperparameter optimization, we will also see that meta-learning techniques will be introduced in other areas to help automate this automation process.

Due to the increasing demand of democratization of AI and ML, we will see automated ML become mainstream in the industry – with all the major tools and hyperscaler platforms providing it as an inherent part of their ML offerings. This next generation of automated ML equipped tools will allow us to perform data preparation, domain customized feature engineering, model selection and counterfactual analysis, operationalization, explainability, monitoring, and create feedback loops. This will make it easier for us to focus on what's important in the business, including business insights and impact.

The automated ML challenges and limitations

As we mentioned earlier, data scientists aren't getting replaced, and automated ML is not a job killer – for now. The job of data scientists will evolve as the toolsets and their functions continue to change.

The reasons for this are twofold. Firstly, automated ML does not automate data science as a discipline. It is definitely a time saver for performing automated feature engineering, architecture search, hyperparameter optimization, or running multiple experiments in parallel. However, there are various other essential parts of the data science life cycle that cannot be easily automated, thus providing the current state of automated ML.

The second key reason is that being a data scientist is not a homogenous role – the competencies and responsibilities related to it vary across the industry and organizations. In lieu of democratizing data science with automated ML, the so-called junior data scientists will gain assistance from automated feature engineering capabilities, and this will speed up their data munging and wrangling practices. Meanwhile, senior engineers will have more time to focus on improving their business outcomes by designing better KPI metrices and enhancing the model's performance. As you can see, this will help all tiers of data science practitioners gain familiarity with the business domain and explore any cross-cutting concerns. Senior data scientists also have the responsibility of monitoring model and data quality and drift, as well as maintaining versioning, auditability, governance, lineage, and other MLOps (Machine Learning Operations) cross-cutting concerns.

Enabling the explainability and transparency of models to address any underlying bias is also a critical component for regulated industries across the world. Due to its highly subjective nature, there is limited functionality to address this automatically in the current toolsets; this is where a socially aware data scientist can provide a tremendous amount of value to stop the perpetuation of algorithmic bias.

A Getting Started guide for enterprises

Congratulations! You have almost made it to the end of the first chapter without dozing off – kudos! Now, you must be wondering: this automated ML thing sounds rad, but how do I go about using it in my company? Here are some pointers.

First, read the rest of this book to familiarize yourself with the concepts, technology, tools, and platforms. It is important to understand the landscape and understand that automated ML is a tool in your data science toolkit – it does not replace your data scientists.

Second, use automated ML as a democratization tool across the enterprise when you're dealing with analytics. Build a training plan for your team to become familiar with the tools, provide guidance, and chart a path to automation in data science workflows.

Lastly, due to the large churn in the feature sets, start with a smaller footprint, probably with an open source stack, before you commit to an enterprise framework. Scaling up this way will help you understand your own automation needs and give you time to do comparison shopping.

Summary

In this chapter, we covered the ML development life cycle and then defined automated ML and how it works. While building a case for the need for automated ML, we discussed the democratization of data science, debunked the myths surrounding automated ML, and provided a detailed walk-through of the automated ML ecosystem. Here, we reviewed the open source tools and then explored the commercial landscape. Finally, we discussed the future of automated ML, commented on the challenges and limitations of it, and finally provided some pointers on how to get started in an enterprise.

In the next chapter, we'll look under the hood of the technologies, techniques, and tools that are used to make automated ML possible. We hope that this chapter has introduced you to the automated ML fundamentals and that you are now ready to do a deeper dive into the topics that we discussed.

Further reading

For more information on the topics that were covered in this chapter, please take a look at the following suggested books and links:

Left arrow icon Right arrow icon

Key benefits

  • Get up to speed with AutoML using OSS, Azure, AWS, GCP, or any platform of your choice
  • Eliminate mundane tasks in data engineering and reduce human errors in machine learning models
  • Find out how you can make machine learning accessible for all users to promote decentralized processes

Description

Every machine learning engineer deals with systems that have hyperparameters, and the most basic task in automated machine learning (AutoML) is to automatically set these hyperparameters to optimize performance. The latest deep neural networks have a wide range of hyperparameters for their architecture, regularization, and optimization, which can be customized effectively to save time and effort. This book reviews the underlying techniques of automated feature engineering, model and hyperparameter tuning, gradient-based approaches, and much more. You'll discover different ways of implementing these techniques in open source tools and then learn to use enterprise tools for implementing AutoML in three major cloud service providers: Microsoft Azure, Amazon Web Services (AWS), and Google Cloud Platform. As you progress, you’ll explore the features of cloud AutoML platforms by building machine learning models using AutoML. The book will also show you how to develop accurate models by automating time-consuming and repetitive tasks in the machine learning development lifecycle. By the end of this machine learning book, you’ll be able to build and deploy AutoML models that are not only accurate, but also increase productivity, allow interoperability, and minimize feature engineering tasks.

Who is this book for?

Citizen data scientists, machine learning developers, artificial intelligence enthusiasts, or anyone looking to automatically build machine learning models using the features offered by open source tools, Microsoft Azure Machine Learning, AWS, and Google Cloud Platform will find this book useful. Beginner-level knowledge of building ML models is required to get the best out of this book. Prior experience in using Enterprise cloud is beneficial.

What you will learn

  • Explore AutoML fundamentals, underlying methods, and techniques
  • Assess AutoML aspects such as algorithm selection, auto featurization, and hyperparameter tuning in an applied scenario
  • Find out the difference between cloud and operations support systems (OSS)
  • Implement AutoML in enterprise cloud to deploy ML models and pipelines
  • Build explainable AutoML pipelines with transparency
  • Understand automated feature engineering and time series forecasting
  • Automate data science modeling tasks to implement ML solutions easily and focus on more complex problems

Product Details

Country selected
Publication date, Length, Edition, Language, ISBN-13
Publication date : Feb 18, 2021
Length: 312 pages
Edition : 1st
Language : English
ISBN-13 : 9781800567689
Category :
Languages :

What do you get with a Packt Subscription?

Free for first 7 days. $19.99 p/m after that. Cancel any time!
Product feature icon Unlimited ad-free access to the largest independent learning library in tech. Access this title and thousands more!
Product feature icon 50+ new titles added per month, including many first-to-market concepts and exclusive early access to books as they are being written.
Product feature icon Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.
Product feature icon Thousands of reference materials covering every tech concept you need to stay up to date.
Subscribe now
View plans & pricing

Product Details

Publication date : Feb 18, 2021
Length: 312 pages
Edition : 1st
Language : English
ISBN-13 : 9781800567689
Category :
Languages :

Packt Subscriptions

See our plans and pricing
Modal Close icon
$19.99 billed monthly
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Simple pricing, no contract
$199.99 billed annually
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just $5 each
Feature tick icon Exclusive print discounts
$279.99 billed in 18 months
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just $5 each
Feature tick icon Exclusive print discounts

Frequently bought together


Stars icon
Total $ 147.97
Automated Machine Learning with AutoKeras
$43.99
Interpretable Machine Learning with Python
$54.99
Automated Machine Learning
$48.99
Total $ 147.97 Stars icon
Banner background image

Table of Contents

14 Chapters
Section 1: Introduction to Automated Machine Learning Chevron down icon Chevron up icon
Chapter 1: A Lap around Automated Machine Learning Chevron down icon Chevron up icon
Chapter 2: Automated Machine Learning, Algorithms, and Techniques Chevron down icon Chevron up icon
Chapter 3: Automated Machine Learning with Open Source Tools and Libraries Chevron down icon Chevron up icon
Section 2: AutoML with Cloud Platforms Chevron down icon Chevron up icon
Chapter 4: Getting Started with Azure Machine Learning Chevron down icon Chevron up icon
Chapter 5: Automated Machine Learning with Microsoft Azure Chevron down icon Chevron up icon
Chapter 6: Machine Learning with AWS Chevron down icon Chevron up icon
Chapter 7: Doing Automated Machine Learning with Amazon SageMaker Autopilot Chevron down icon Chevron up icon
Chapter 8: Machine Learning with Google Cloud Platform Chevron down icon Chevron up icon
Chapter 9: Automated Machine Learning with GCP Chevron down icon Chevron up icon
Section 3: Applied Automated Machine Learning Chevron down icon Chevron up icon
Chapter 10: AutoML in the Enterprise Chevron down icon Chevron up icon
Other Books You May Enjoy Chevron down icon Chevron up icon

Customer reviews

Top Reviews
Rating distribution
Full star icon Full star icon Full star icon Full star icon Half star icon 4.5
(15 Ratings)
5 star 60%
4 star 33.3%
3 star 0%
2 star 6.7%
1 star 0%
Filter icon Filter
Top Reviews

Filter reviews by




MLEngineer Mar 29, 2021
Full star icon Full star icon Full star icon Full star icon Full star icon 5
This book contains a nice overview of the 'big-picture' of Machine Learning. Machine Learning pipelines involve a lot of steps and after iterating on a few models at production every ML Professional should look at the big machine learning picture. The book talks about data gathering, hyper-parameter optimization, working on various clouds, and never forgetting about the domain/business/production of each ML model.This is a very nice book.
Amazon Verified review Amazon
Lucinda Linde Mar 20, 2021
Full star icon Full star icon Full star icon Full star icon Full star icon 5
Disclaimer: This review has been requested by the publisher, and I am giving my honest review of this book. This review is based on reading the book. As with any Packt publication, it's also necessary to try out the code, which I will at a later point in time.OverviewAutomated Machine Learning is a very helpful primer on the up-to-date options available to use automated machine learning. This book is helpful to someone who has built ML models and wants to automate some of the more repetitive parts. I am looking into taking the AWS Cloud Practitioner exam. This will help me understand some of the AWS cloud offerings.What I like about this book:The book starts by giving a framework by which to compare the different auto-ML options. Machine Learning techniques have evolved to have a mind-numbing number of parameters to tune. To help data scientists optimize and scale building model, automation has become more important to realizing the benefits of these new models. The main three things to automate are Feature Engineering, Hyperparameter and model selection, and Deep Learning. I like that the author keeps this framework in the reader's mind each time the material is covered in increasing depth.The book starts with the big picture in Chapters 1-3, showing the open source and proprietary options for auto-ML. It's great to learn that there are free and open-source options to automate machine learning. There are entire chapters and parts of the book devoted to Google Colab, Linux, TPOT and other free versions.Then there are multiple chapter deep dives on the major environments of ML: Microsoft Azure, Amazon Web Services (AWS) and Google Cloud. For each of these topics, very helpful screen shots are provided to:• Setup the environment, account; install initial libraries• Supply code to for example projects so that the reader can practice using auto-ML• Show what the ouput looks like, and explain the outputThe visual frameworks, process flow diagrams, tables etc. put order to the inherent complexity, and provide a useful way comparing the major options (MS Azure, AWS and Google). Just looking at the structure of the tables brings out what’s important and the contents of the tables highlight what’s different.Finally, I love the nerd humor that occasionally pops up. Makes for a fun reading experience.One worry about this bookThe screen shots are very helpful to set up the programs and examples especially right when this is published. One worry is that those screen shots will be out of date in a few months. These differences may cause confusion as readers try to implement these examples.Overall, " Automated Machine Learning" is a really helpful introduction with some hands-on initial examples into the options to use when automating complex machine learning models. Auto-ML automates the repetitive and sprawling tasks to building machine learning models with lots of features and parameters.
Amazon Verified review Amazon
Adwait Ullal Aug 29, 2021
Full star icon Full star icon Full star icon Full star icon Full star icon 5
Note: I was provided with a free eBook copy in exchange for an unbiased review.This book, Automated Machine Learning, provides a succinct summary on the state of AutoML and the enterprise.The book is divided into three sections:- Introduction to AutoML- State of AutoML in the cloud(s)- AutoML in the EnterpriseThe topics in the book are organized well for anyone who wants to understand AutoML.What I liked about:- Depending on your skill level, you can choose a Section of interest (i.e. an experienced ML developer can jump directly to the cloud section)- Logical progression- Coverage of the top three cloud providers (AWS, Azure & GCP)Room for improvement:- AutoML in the Enterprise could use some more organization and/or topics in terms of smaller chapters, roadmaps, etc.In summary, this is a good book to quick understanding of AutoML across the top three cloud platforms.
Amazon Verified review Amazon
Julie Zhu Apr 19, 2021
Full star icon Full star icon Full star icon Full star icon Full star icon 5
As a business owner and building machine learning models, the goal is to implement the models into production automation. This book has provided a full stack of implementations including the well known AutoML tools such as Microsoft Azure, AWS and Google Cloud step by step, very instrumental and practical book to follow. I would strongly recommend this highly sought book.
Amazon Verified review Amazon
David G Mar 23, 2021
Full star icon Full star icon Full star icon Full star icon Full star icon 5
Automated machine learning is a highly sought after skill in modern ML development stack, and it is quickly becoming part of all modern AI platforms. I wanted a breadth first approach of the topic, with an overview of different cloud AutoML technologies. This book provides exactly the right amount of breadth and depth into multiple cloud platforms, and open source toolkits. Highly recommended.
Amazon Verified review Amazon
Get free access to Packt library with over 7500+ books and video courses for 7 days!
Start Free Trial

FAQs

What is included in a Packt subscription? Chevron down icon Chevron up icon

A subscription provides you with full access to view all Packt and licnesed content online, this includes exclusive access to Early Access titles. Depending on the tier chosen you can also earn credits and discounts to use for owning content

How can I cancel my subscription? Chevron down icon Chevron up icon

To cancel your subscription with us simply go to the account page - found in the top right of the page or at https://subscription.packtpub.com/my-account/subscription - From here you will see the ‘cancel subscription’ button in the grey box with your subscription information in.

What are credits? Chevron down icon Chevron up icon

Credits can be earned from reading 40 section of any title within the payment cycle - a month starting from the day of subscription payment. You also earn a Credit every month if you subscribe to our annual or 18 month plans. Credits can be used to buy books DRM free, the same way that you would pay for a book. Your credits can be found in the subscription homepage - subscription.packtpub.com - clicking on ‘the my’ library dropdown and selecting ‘credits’.

What happens if an Early Access Course is cancelled? Chevron down icon Chevron up icon

Projects are rarely cancelled, but sometimes it's unavoidable. If an Early Access course is cancelled or excessively delayed, you can exchange your purchase for another course. For further details, please contact us here.

Where can I send feedback about an Early Access title? Chevron down icon Chevron up icon

If you have any feedback about the product you're reading, or Early Access in general, then please fill out a contact form here and we'll make sure the feedback gets to the right team. 

Can I download the code files for Early Access titles? Chevron down icon Chevron up icon

We try to ensure that all books in Early Access have code available to use, download, and fork on GitHub. This helps us be more agile in the development of the book, and helps keep the often changing code base of new versions and new technologies as up to date as possible. Unfortunately, however, there will be rare cases when it is not possible for us to have downloadable code samples available until publication.

When we publish the book, the code files will also be available to download from the Packt website.

How accurate is the publication date? Chevron down icon Chevron up icon

The publication date is as accurate as we can be at any point in the project. Unfortunately, delays can happen. Often those delays are out of our control, such as changes to the technology code base or delays in the tech release. We do our best to give you an accurate estimate of the publication date at any given time, and as more chapters are delivered, the more accurate the delivery date will become.

How will I know when new chapters are ready? Chevron down icon Chevron up icon

We'll let you know every time there has been an update to a course that you've bought in Early Access. You'll get an email to let you know there has been a new chapter, or a change to a previous chapter. The new chapters are automatically added to your account, so you can also check back there any time you're ready and download or read them online.

I am a Packt subscriber, do I get Early Access? Chevron down icon Chevron up icon

Yes, all Early Access content is fully available through your subscription. You will need to have a paid for or active trial subscription in order to access all titles.

How is Early Access delivered? Chevron down icon Chevron up icon

Early Access is currently only available as a PDF or through our online reader. As we make changes or add new chapters, the files in your Packt account will be updated so you can download them again or view them online immediately.

How do I buy Early Access content? Chevron down icon Chevron up icon

Early Access is a way of us getting our content to you quicker, but the method of buying the Early Access course is still the same. Just find the course you want to buy, go through the check-out steps, and you’ll get a confirmation email from us with information and a link to the relevant Early Access courses.

What is Early Access? Chevron down icon Chevron up icon

Keeping up to date with the latest technology is difficult; new versions, new frameworks, new techniques. This feature gives you a head-start to our content, as it's being created. With Early Access you'll receive each chapter as it's written, and get regular updates throughout the product's development, as well as the final course as soon as it's ready.We created Early Access as a means of giving you the information you need, as soon as it's available. As we go through the process of developing a course, 99% of it can be ready but we can't publish until that last 1% falls in to place. Early Access helps to unlock the potential of our content early, to help you start your learning when you need it most. You not only get access to every chapter as it's delivered, edited, and updated, but you'll also get the finalized, DRM-free product to download in any format you want when it's published. As a member of Packt, you'll also be eligible for our exclusive offers, including a free course every day, and discounts on new and popular titles.