Throughout this book, you will learn both theoretical and practical aspects of AutoML systems. More importantly, you will practice your skills by developing an AutoML system from scratch.
What will you learn?
Core components of AutoML systems
In this section, you will review the following core components of AutoML systems:
- Automated feature preprocessing
- Automated algorithm selection
- Hyperparameter optimization
Having a better understanding of core components will help you to create your mental map of AutoML systems.
Automated feature preprocessing
When you are dealing with ML problems, you usually have a relational dataset that has various types of data, and you should properly treat each of them before training ML algorithms.
For example, if you are dealing with numerical data, you may scale it by applying methods such as min-max scaling or variance scaling.
For textual data, you may want to remove stop-words such as a, an, and the, and perform operations such as stemming, parsing, and tokenization.
For categorical data, you may need to encode it using methods such as one-hot encoding, dummy coding, and feature hashing.
How about having a very high number of features? For example, when you have thousands of features, how many of them would actually be useful? Would it be better to reduce dimensionality by using methods such as Principal Component Analysis (PCA)?
What if you have different formats of data, such as video, audio, and image? How do you process each of them?
For example, for image data, you may apply some transformations such as rescaling the images to common shape and segmentation to separate certain regions.
Automated algorithm selection
Once you are done with feature processing, you need to find a suitable set of algorithms for training and evaluation.
Every ML algorithm has an ability to solve certain problems. Let's consider clustering algorithms such as k-means, hierarchical clustering, spectral clustering, and DBSCAN. We are familiar with k-means, but what about the others? Each of these algorithms has application areas and each might perform better than others based on the distributional properties of a dataset.
AutoML pipelines can help you to choose the right algorithm from a set of suitable algorithms for a given problem.
Hyperparameter optimization
Every ML algorithm has one or many hyperparameters and you are already familiar with k-means. But it is not only ML algorithms that have hyperparameters, feature processing methods also have their hyperparameters and those also need fine-tuning.
Tuning hyperparameters is crucially important to a model's success and AutoML pipeline will help you to define a range of hyperparameters that you would like to experiment with, resulting in the best performing ML pipeline.
Building prototype subsystems for each component
Throughout the book, you will be building each core component of AutoML systems from scratch and seeing how each part interacts with each other.
Having skills to build such systems from scratch will give you a deeper understanding of the process and also inner workings of popular AutoML libraries.
Putting it all together as an end–to–end AutoML system
Once you have gone through all the chapters, you will have a good understanding of the components and how they work together to create ML pipelines. You will then use your knowledge to write AutoML pipelines from scratch and tweak them in any way that would work for a set of problems that you are aiming to solve.