Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds
Hands-On Automated Machine Learning
Hands-On Automated Machine Learning

Hands-On Automated Machine Learning: A beginner's guide to building automated machine learning systems using AutoML and Python

Arrow left icon
Profile Icon Das Profile Icon Mert Cakmak
Arrow right icon
€8.99 €26.99
eBook Apr 2018 282 pages 1st Edition
eBook
€8.99 €26.99
Paperback
€32.99
Subscription
Free Trial
Renews at €18.99p/m
Arrow left icon
Profile Icon Das Profile Icon Mert Cakmak
Arrow right icon
€8.99 €26.99
eBook Apr 2018 282 pages 1st Edition
eBook
€8.99 €26.99
Paperback
€32.99
Subscription
Free Trial
Renews at €18.99p/m
eBook
€8.99 €26.99
Paperback
€32.99
Subscription
Free Trial
Renews at €18.99p/m

What do you get with eBook?

Product feature icon Instant access to your Digital eBook purchase
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
OR
Modal Close icon
Payment Processing...
tick Completed

Billing Address

Table of content icon View table of contents Preview book icon Preview Book

Hands-On Automated Machine Learning

Introduction to Machine Learning Using Python

The last chapter introduced you to the world of machine learning (ML). In this chapter, we will develop the ML foundations that are required for building and using Automated ML (AutoML) platforms. It is not always clear how ML is best applied or what it takes to implement it. However, ML tools are getting more straightforward to use, and AutoML platforms are making it more accessible to a broader audience. In the future there will undoubtedly be a higher collaboration between man and machine.

The future of ML may require people to prepare data for its consumption and identify use cases for implementation. More importantly, people are needed to interpret the results and audit the ML system—whether they are following the right and best approaches to solving a problem. The future looks pretty amazing, but we need to build that...

Technical requirements

All the code examples can be found in the Chapter 02 folder in GitHub.

Machine learning

Machine learning dates back to centuries. It was born from the theory that computers can learn without being programmed to perform specific tasks. The iterative aspect of ML is essential as the machines need to adapt themselves to new data always. They need to learn from the historical data, optimize for better computations, and also generalize themselves to provide proper results.

We all are aware of rule-based systems, where we have a set of predefined conditions for a machine to execute and provide the results. How great will it be when machines learn these patterns by themselves, deliver the results, and explain the rules that it discovered; this is ML. It is a broader term used for various methods and algorithms that are used by machines to learn from the data. As a branch of artificial intelligence (AI), the ML algorithms are quite often used to discover...

Linear regression

Let's begin our triple W session with linear regression first.

What is linear regression?

It is the traditional and most-used regression analysis. It is studied rigorously and used widely for practical purposes. Linear regression is a method for determining the relationship between a dependent variable (y) and one or more independent variables (x). This derived relationship can be used to predict an unexplained y from observed x's. Mathematically, if x is an independent variable (commonly known as the predictor) and y is a dependent variable (also known as the target), the relationship is expressed as follows:

Where m is the slope of line, b is the intercept of the best-fit regression line, and...

Important evaluation metrics – regression algorithms

Assessing the value of a ML model is a two-phase process. First, the model has to be evaluated for its statistical accuracy, that is, whether the statistical hypotheses are correct, model performance is outstanding, and the performance holds true for other independent datasets. This is accomplished using several model evaluation metrics. Then, a model is evaluated to see if the results are as expected as per business requirement and the stakeholders genuinely get some insights or useful predictions out of it.

A regression model is evaluated based on the following metrics:

  • Mean absolute error (MAE): It is the sum of absolute values of prediction error. The prediction error is defined as the difference between predicted and actual values. This metric gives an idea about the magnitude of the error. However, we cannot judge...

Logistic regression

Let's start again with the triple W for logistics regression. To reiterate the tripe W method, we first ask the algorithm what it is, followed by where it can be used, and finally by what method we can implement the model.

What is logistic regression?

Logistic regression can be thought of as an extension to linear regression algorithms. It fundamentally works like linear regression, but it is meant for discrete or categorical outcomes.

Where is logistic regression used?

Logistic regression is applied in the case of discrete target variables such...

Important evaluation metrics – classification algorithms

Most of the metrics used to assess a classification model are based on the values that we get in the four quadrants of a confusion matrix. Let's begin this section by understanding what it is:

  • Confusion matrix: It is the cornerstone of evaluating a classification model (that is, classifier). As the name stands, the matrix is sometimes confusing. Let's try to visualize the confusion matrix as two axes in a graph. The x axis label is prediction, with two values—Positive and Negative. Similarly, the y axis label is actually with the same two values—Positive and Negative, as shown in the following figure. This matrix is a table that contains the information about the count of actual and predicted values by a classifier:
  • If we try to deduce information about each quadrant in the matrix:
    • Quadrant...

Decision trees

Decision trees are extensively-used classifiers in the ML world for their transparency on representing the rules that drive a classification/prediction. Let us ask the triple W questions to this algorithm to know more about it.

What are decision trees?

Decision trees are arranged in a hierarchical tree-like structure and are easy to explain and interpret. They are not susceptive to outliers. The process of creating a decision tree is a recursive partitioning method where it splits the training data into various groups with an objective to find homogeneous pure subgroups, that is, data with only one class.

Outliers are values that lie far away from other data points and distort the data distribution.
...

Support Vector Machines

SVM is a supervised ML algorithm used primarily for classification tasks, however, it can be used for regression problems as well.

What is SVM?

SVM is a classifier that works on the principle of separating hyperplanes. Given a training dataset, the algorithms find a hyperplane that maximizes the separation of the classes and uses these partitions for the prediction of a new dataset. The hyperplane is a subspace of one dimension less than its ambient plane. This means the line is a hyperplane for a two-dimensional dataset.

Where is SVM used?

SVM...

k-Nearest Neighbors

Before we build a KNN model for the HR attrition dataset, let us understand KNN's triple W.

What is k-Nearest Neighbors?

KNN is one of the most straightforward algorithms that stores all available data points and predicts new data based on distance similarity measures such as Euclidean distance. It is an algorithm that can make predictions using the training dataset directly. However, it is much more resource intensive as it doesn't have any training phase and requires all data present in memory to predict new instances.

Euclidean distance is calculated as the square root of the sum of the squared differences between two points.
...

Ensemble methods

Ensembling models are a robust approach to enhancing the efficiency of the predictive models. It is a well-thought out strategy that is very similar to a power-packed word—TEAM !! Any task done by a team leads to significant accomplishments.

What are ensemble models?

Likewise, in the ML world, an ensemble model is a team of models operating together to enhance the result of their work. Technically, ensemble models comprise of several supervised learning models that are individually trained, and the results are merged in various ways to achieve the final prediction. This result has higher predictive power than the results of any of its constituting learning algorithms independently.

Mostly, there are...

Comparing the results of classifiers

We have created around six classification models on the HR attrition dataset. The following table summarizes the evaluation scores for each model:

The random forest model appears to be a winner among all six models, with a record-breaking 99% accuracy. Now, we need not further improve the random forest model, but check whether it can generalize well to a new dataset and the results are not overfitting the train dataset. One of the methods is to do cross-validation.

Cross-validation

Cross-validation is a way to evaluate the accuracy of a model on a dataset that was not used for training, that is, a sample of data that is unknown to trained models. This ensures generalization of a model on independent datasets when deployed in a production environment. One of the methods is dividing the dataset into two sets—train and test sets. We demonstrated this method in our previous examples.

Another popular and more robust method is a k-fold cross-validation approach, where a dataset is partitioned into k subsamples of equal sizes. Where k is a non-zero positive integer. During the training phase, k-1 samples are used to train the model and the remaining one sample is used to test the model. This process is repeated for k times with one of the k samples used exactly once to test the model. The evaluation results are then averaged or combined...

Clustering

We will begin this section with a question. How do we start learning a new algorithm or a machine learning method? We start with triple W. So, let's being with that for the clustering method.

What is clustering?

Clustering is a technique to group similar data together, and a group has some unique characteristics that are different from other groups. Data can be clustered together using various methods. One of them is rule-based, where the groups are formed based on certain predefined conditions, such as grouping customers based on their age or industry. Another method is to use ML algorithms to cluster data together.

...

Summary

The ML and its automation journey are long. The aim of this chapter was to familiarize ourselves with machine learning concepts; most importantly, the scikit-learn and other Python packages, so that we can smoothly accelerate our learning in the next chapters, create a linear regression model and six classification models, and learn about clustering techniques and compare the models with each other.

We used a single HR attrition dataset for creating all classifiers. We observed that there are many similarities in these codes. The libraries imported are all similar except the one used to instantiate the machine learning class. The data preprocessing module is redundant in all code. The machine learning technique changes based on the task and data of the target attribute. Also, the evaluation methodology is equivalent to the similar type of ML methods.

Do you think that...

Left arrow icon Right arrow icon
Download code icon Download Code

Key benefits

  • Build automated modules for different machine learning components
  • Understand each component of a machine learning pipeline in depth
  • Learn to use different open source AutoML and feature engineering platforms

Description

AutoML is designed to automate parts of Machine Learning. Readily available AutoML tools are making data science practitioners’ work easy and are received well in the advanced analytics community. Automated Machine Learning covers the necessary foundation needed to create automated machine learning modules and helps you get up to speed with them in the most practical way possible. In this book, you’ll learn how to automate different tasks in the machine learning pipeline such as data preprocessing, feature selection, model training, model optimization, and much more. In addition to this, it demonstrates how you can use the available automation libraries, such as auto-sklearn and MLBox, and create and extend your own custom AutoML components for Machine Learning. By the end of this book, you will have a clearer understanding of the different aspects of automated Machine Learning, and you’ll be able to incorporate automation tasks using practical datasets. You can leverage your learning from this book to implement Machine Learning in your projects and get a step closer to winning various machine learning competitions.

Who is this book for?

If you’re a budding data scientist, data analyst, or Machine Learning enthusiast and are new to the concept of automated machine learning, this book is ideal for you. You’ll also find this book useful if you’re an ML engineer or data professional interested in developing quick machine learning pipelines for your projects. Prior exposure to Python programming will help you get the best out of this book.

What you will learn

  • Understand the fundamentals of Automated Machine Learning systems
  • Explore auto-sklearn and MLBox for AutoML tasks
  • Automate your preprocessing methods along with feature transformation
  • Enhance feature selection and generation using the Python stack
  • Assemble individual components of ML into a complete AutoML framework
  • Demystify hyperparameter tuning to optimize your ML models
  • Dive into Machine Learning concepts such as neural networks and autoencoders
  • Understand the information costs and trade-offs associated with AutoML

Product Details

Country selected
Publication date, Length, Edition, Language, ISBN-13
Publication date : Apr 26, 2018
Length: 282 pages
Edition : 1st
Language : English
ISBN-13 : 9781788622288
Category :
Languages :

What do you get with eBook?

Product feature icon Instant access to your Digital eBook purchase
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
OR
Modal Close icon
Payment Processing...
tick Completed

Billing Address

Product Details

Publication date : Apr 26, 2018
Length: 282 pages
Edition : 1st
Language : English
ISBN-13 : 9781788622288
Category :
Languages :

Packt Subscriptions

See our plans and pricing
Modal Close icon
€18.99 billed monthly
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Simple pricing, no contract
€189.99 billed annually
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just €5 each
Feature tick icon Exclusive print discounts
€264.99 billed in 18 months
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just €5 each
Feature tick icon Exclusive print discounts

Frequently bought together


Stars icon
Total 106.97
Hands-On Automated Machine Learning
€32.99
Deep Reinforcement Learning Hands-On
€36.99
Mastering Machine Learning Algorithms
€36.99
Total 106.97 Stars icon
Banner background image

Table of Contents

9 Chapters
Introduction to AutoML Chevron down icon Chevron up icon
Introduction to Machine Learning Using Python Chevron down icon Chevron up icon
Data Preprocessing Chevron down icon Chevron up icon
Automated Algorithm Selection Chevron down icon Chevron up icon
Hyperparameter Optimization Chevron down icon Chevron up icon
Creating AutoML Pipelines Chevron down icon Chevron up icon
Dive into Deep Learning Chevron down icon Chevron up icon
Critical Aspects of ML and Data Science Projects Chevron down icon Chevron up icon
Other Books You May Enjoy Chevron down icon Chevron up icon
Get free access to Packt library with over 7500+ books and video courses for 7 days!
Start Free Trial

FAQs

How do I buy and download an eBook? Chevron down icon Chevron up icon

Where there is an eBook version of a title available, you can buy it from the book details for that title. Add either the standalone eBook or the eBook and print book bundle to your shopping cart. Your eBook will show in your cart as a product on its own. After completing checkout and payment in the normal way, you will receive your receipt on the screen containing a link to a personalised PDF download file. This link will remain active for 30 days. You can download backup copies of the file by logging in to your account at any time.

If you already have Adobe reader installed, then clicking on the link will download and open the PDF file directly. If you don't, then save the PDF file on your machine and download the Reader to view it.

Please Note: Packt eBooks are non-returnable and non-refundable.

Packt eBook and Licensing When you buy an eBook from Packt Publishing, completing your purchase means you accept the terms of our licence agreement. Please read the full text of the agreement. In it we have tried to balance the need for the ebook to be usable for you the reader with our needs to protect the rights of us as Publishers and of our authors. In summary, the agreement says:

  • You may make copies of your eBook for your own use onto any machine
  • You may not pass copies of the eBook on to anyone else
How can I make a purchase on your website? Chevron down icon Chevron up icon

If you want to purchase a video course, eBook or Bundle (Print+eBook) please follow below steps:

  1. Register on our website using your email address and the password.
  2. Search for the title by name or ISBN using the search option.
  3. Select the title you want to purchase.
  4. Choose the format you wish to purchase the title in; if you order the Print Book, you get a free eBook copy of the same title. 
  5. Proceed with the checkout process (payment to be made using Credit Card, Debit Cart, or PayPal)
Where can I access support around an eBook? Chevron down icon Chevron up icon
  • If you experience a problem with using or installing Adobe Reader, the contact Adobe directly.
  • To view the errata for the book, see www.packtpub.com/support and view the pages for the title you have.
  • To view your account details or to download a new copy of the book go to www.packtpub.com/account
  • To contact us directly if a problem is not resolved, use www.packtpub.com/contact-us
What eBook formats do Packt support? Chevron down icon Chevron up icon

Our eBooks are currently available in a variety of formats such as PDF and ePubs. In the future, this may well change with trends and development in technology, but please note that our PDFs are not Adobe eBook Reader format, which has greater restrictions on security.

You will need to use Adobe Reader v9 or later in order to read Packt's PDF eBooks.

What are the benefits of eBooks? Chevron down icon Chevron up icon
  • You can get the information you need immediately
  • You can easily take them with you on a laptop
  • You can download them an unlimited number of times
  • You can print them out
  • They are copy-paste enabled
  • They are searchable
  • There is no password protection
  • They are lower price than print
  • They save resources and space
What is an eBook? Chevron down icon Chevron up icon

Packt eBooks are a complete electronic version of the print edition, available in PDF and ePub formats. Every piece of content down to the page numbering is the same. Because we save the costs of printing and shipping the book to you, we are able to offer eBooks at a lower cost than print editions.

When you have purchased an eBook, simply login to your account and click on the link in Your Download Area. We recommend you saving the file to your hard drive before opening it.

For optimal viewing of our eBooks, we recommend you download and install the free Adobe Reader version 9.