You're reading from Hands-On Automated Machine Learning A beginner's guide to building automated machine learning systems using AutoML and Python

Product type Paperback

Published in Apr 2018

Publisher Packt

ISBN-13 9781788629898

Length 282 pages

Edition 1st Edition

Languages

Python

Tools

Assemble

Concepts

Machine Learning

Authors (2):

Umit Mert Cakmak

Sibanjan Das

View More author details

What will you learn?

Throughout this book, you will learn both theoretical and practical aspects of AutoML systems. More importantly, you will practice your skills by developing an AutoML system from scratch.

Core components of AutoML systems

In this section, you will review the following core components of AutoML systems:

Automated feature preprocessing

Automated algorithm selection
Hyperparameter optimization

Having a better understanding of core components will help you to create your mental map of AutoML systems.

Automated feature preprocessing

When you are dealing with ML problems, you usually have a relational dataset that has various types of data, and you should properly treat each of them before training ML algorithms.

For example, if you are dealing with numerical data, you may scale it by applying methods such as min-max scaling or variance scaling.

For textual data, you may want to remove stop-words such as a, an, and the, and perform operations such as stemming, parsing, and tokenization.

For categorical data, you may need to encode it using methods such as one-hot encoding, dummy coding, and feature hashing.

How about having a very high number of features? For example, when you have thousands of features, how many of them would actually be useful? Would it be better to reduce dimensionality by using methods such as Principal Component Analysis (PCA)?

What if you have different formats of data, such as video, audio, and image? How do you process each of them?

For example, for image data, you may apply some transformations such as rescaling the images to common shape and segmentation to separate certain regions.

There is an abundance of feature preprocessing methods, and ML algorithms will perform better with some set of transformations. Having a flexible AutoML system in your arsenal will allow you to experiment with different combinations in a smart way, which will save you much needed time and money in your projects.

Automated algorithm selection

Once you are done with feature processing, you need to find a suitable set of algorithms for training and evaluation.

Every ML algorithm has an ability to solve certain problems. Let's consider clustering algorithms such as k-means, hierarchical clustering, spectral clustering, and DBSCAN. We are familiar with k-means, but what about the others? Each of these algorithms has application areas and each might perform better than others based on the distributional properties of a dataset.

AutoML pipelines can help you to choose the right algorithm from a set of suitable algorithms for a given problem.

Hyperparameter optimization

Every ML algorithm has one or many hyperparameters and you are already familiar with k-means. But it is not only ML algorithms that have hyperparameters, feature processing methods also have their hyperparameters and those also need fine-tuning.

Tuning hyperparameters is crucially important to a model's success and AutoML pipeline will help you to define a range of hyperparameters that you would like to experiment with, resulting in the best performing ML pipeline.

Building prototype subsystems for each component

Throughout the book, you will be building each core component of AutoML systems from scratch and seeing how each part interacts with each other.

Having skills to build such systems from scratch will give you a deeper understanding of the process and also inner workings of popular AutoML libraries.

Putting it all together as an end–to–end AutoML system

Once you have gone through all the chapters, you will have a good understanding of the components and how they work together to create ML pipelines. You will then use your knowledge to write AutoML pipelines from scratch and tweak them in any way that would work for a set of problems that you are aiming to solve.