Packt+ | Advance your knowledge in tech

You're reading from Ensemble Machine Learning Cookbook Over 35 practical recipes to explore ensemble machine learning techniques using Python

Product type Paperback

Published in Jan 2019

Publisher Packt

ISBN-13 9781789136609

Length 336 pages

Edition 1st Edition

Languages

Python

Tools

Scikit-learn

Concepts

Machine Learning

Authors (2):

Vijayalakshmi Natarajan

Dipayan Sarkar

View More author details

Chapter 1, Get Closer to Your Data, explores a dataset and implements hands-on coding with Python for exploratory data analysis using statistical methods and visualization for the dataset.

Chapter 2, Getting Started with Ensemble Machine Learning, explores what ensemble learning is and how it can help in real-life scenarios. Basic ensemble techniques, including averaging, weighted averaging, and max-voting, are explained. These techniques form the basis for ensemble techniques, and an understanding of them will lay the groundwork for readers to move to more advanced stage after reading this chapter.

Chapter 3, Resampling Methods, introduces a handful of algorithms that will be useful when we get into an ensemble of multiple heterogeneous algorithms. This chapter uses scikit-learn to prepare all the algorithms to be used.

Chapter 4, Statistical and Machine Learning Algorithms, helps the readers to get to know various types of resampling methods that are used by machine-learning algorithms. Each resampling method has its advantages and disadvantages, which are explained to the readers. The readers also learn the code to be executed for each type of sampling.

Chapter 5, Bag the Models with Bagging, provides the readers with an understanding of what bootstrap aggregation is and how the bootstrap results can be aggregated, in a process also known as bagging.

Chapter 6, When in Doubt, Use Random Forests, introduces the random forest algorithm. It will introduce to readers how, and what kind of, ensemble techniques are used by Random Forest and how this helps our models avoid overfitting.

Chapter 7, Boosting Model Performance with Boosting, introduces boosting and discusses how it helps to improve a model performance by reducing variances and increasing accuracy. This chapter provides information such as the fact that boosting is not robust against outliers and noisy data but is flexible and can be used with a loss function.

Chapter 8, Blend It with Stacking, applies stacking to learn the optimal combination of base learners. This chapter will acquaint readers with stacking, which is also known as stacked generalization.

Chapter 9, Homogeneous Ensemble Using Keras, is a complete code walk-through on a classification case study for recognizing hand-written digits with homogeneous algorithms – in this case, multiple neural network models using Keras.

Chapter 10, Heterogeneous Ensemble Classifiers Using H2O, is a complete code walk-through on a classification case study for default prediction with an ensemble of multiple heterogeneous algorithms using scikit-learn.

Chapter 11, Heterogeneous Ensemble for Text Classification Using NLP, is a complete code walk-through on a classification case study to classify sentiment polarity using an ensemble of multiple heterogeneous algorithms. Here, NLP techniques such as semantics are used to improve the accuracy of classification. Then, the mined text information is used to employ ensemble classification techniques for sentiment analysis. In this case study, the H2O library is used for building models.

Chapter 12, Homogeneous Ensemble for Multiclass Classification Using Keras, is a complete code walk-through on a classification case study for multiple classification with homogeneous ensemble using data diversity with the tf.keras module from TensorFlow.