Hands-On Machine Learning with C++

Introduction to Machine Learning with C++

There are different approaches to make computers solve tasks. One of them is to define an explicit algorithm, and another one is to use implicit strategies based on mathematical and statistical methods. Machine Learning (ML) is one of the implicit methods that uses mathematical and statistical approaches to solve tasks. It is an actively growing discipline, and a lot of scientists and researchers find it to be one of the best ways to move forward toward systems acting as human-level artificial intelligence (AI).

In general, ML approaches have the idea of searching patterns in a given dataset as their basis. Consider a recommendation system for a news feed, which provides the user with a personalized feed based on their previous activity or preferences. The software gathers information about the type of news article the user reads and calculates some statistics. For example, it could be the frequency of some topics appearing in a set of news articles. Then, it performs some predictive analytics, identifies general patterns, and uses them to populate the user's news feed. Such systems periodically track a user's activity, and update the dataset and calculate new trends for recommendations.

There are many areas where ML has started to play an important role. It is used for solving enterprise business tasks as well as for scientific researches. In customer relationship management (CRM) systems, ML models are used to analyze sales team activity, to help them to process the most important requests first. ML models are used in business intelligence (BI) and analytics to find essential data points. Human resource (HR) departments use ML models to analyze their employees' characteristics in order to identify the most effective ones and use this information when searching applicants for open positions.

A fast-growing direction of research is self-driving cars, and deep learning neural networks are used extensively in this area. They are used in computer vision systems for object identification as well as for navigation and steering systems, which are necessary for car driving.

Another popular use of ML systems is electronic personal assistants, such as Siri from Apple or Alexa from Amazon. Such products also use deep learning models to analyze natural speech or written text to process users' requests and make a natural response in a relevant context. Such requests can activate music players with preferred songs, as well as update a user's personal schedule or book flight tickets.

This chapter describes what ML is and which tasks can be solved with ML, and discusses different approaches used in ML. It aims to show the minimally required math to start implementing ML algorithms. It also covers how to perform basic linear algebra operations in libraries such as Eigen, xtensor, Shark-ML, Shogun, and Dlib, and also explains the linear regression task as an example.

The following topics will be covered in this chapter:

Understanding the fundamentals of ML
An overview of linear algebra
An overview of a linear regression example

Understanding the fundamentals of ML

There are different approaches to create and train ML models. In this section, we show what these approaches are and how they differ. Apart from the approach we use to create a ML model, there are also parameters that manage how this model behaves in the training and evaluation processes. Model parameters can be divided into two distinct groups, which should be configured in different ways. The last crucial part of the ML process is a technique that we use to train a model. Usually, the training technique uses some numerical optimization algorithm that finds the minimal value of a target function. In ML, the target function is usually called a loss function and is used for penalizing the training algorithm when it makes errors. We discuss these concepts more precisely in the following sections.

Venturing into the techniques of ML

We can divide ML approaches into two techniques, as follows:

Supervised learning is an approach based on the use of labeled data. Labeled data is a set of known data samples with corresponding known target outputs. Such a kind of data is used to build a model that can predict future outputs.
Unsupervised learning is an approach that does not require labeled data and can search hidden patterns and structures in an arbitrary kind of data.

Let's have a look at each of the techniques in detail.

Supervised learning

Supervised ML algorithms usually take a limited set of labeled data and build models that can make reasonable predictions for new data. We can split supervised learning algorithms into two main parts, classification and regression techniques, described as follows:

Classification models predict some finite and distinct types of categories—this could be a label that identifies if an email is spam or not, or whether an image contains a human face or not. Classification models are applied in speech and text recognition, object identification on images, credit scoring, and others. Typical algorithms for creating classification models are Support Vector Machine (SVM), decision tree approaches, k-nearest neighbors (KNN), logistic regression, Naive Bayes, and neural networks. The following chapters describe the details of some of these algorithms.
Regression models predict continuous responses such as changes in temperature or values of currency exchange rates. Regression models are applied in algorithmic trading, forecasting of electricity load, revenue prediction, and others. Creating a regression model usually makes sense if the output of the given labeled data is real numbers. Typical algorithms for creating regression models are linear and multivariate regressions, polynomial regression models, and stepwise regressions. We can use decision tree techniques and neural networks to create regression models too. The following chapters describe the details of some of these algorithms.

Unsupervised learning

Unsupervised learning algorithms do not use labeled datasets. They create models that use intrinsic relations in data to find hidden patterns that they can use for making predictions. The most well-known unsupervised learning technique is clustering. Clustering involves dividing a given set of data in a limited number of groups according to some intrinsic properties of data items. Clustering is applied in market researches, different types of exploratory analysis, deoxyribonucleic acid (DNA) analysis, image segmentation, and object detection. Typical algorithms for creating models for performing clustering are k-means, k-medoids, Gaussian mixture models, hierarchical clustering, and hidden Markov models. Some of these algorithms are explained in the following chapters of this book.

Dealing with ML models

We can interpret ML models as functions that take different types of parameters. Such functions provide outputs for given inputs based on the values of these parameters. Developers can configure the behavior of ML models for solving problems by adjusting model parameters. Training a ML model can usually be treated as a process of searching the best combination of its parameters. We can split the ML model's parameters into two types. The first type consists of parameters internal to the model, and we can estimate their values from the training (input) data. The second type consists of parameters external to the model, and we cannot estimate their values from training data. Parameters that are external to the model are usually called hyperparameters.

Internal parameters have the following characteristics:

They are necessary for making predictions.
They define the quality of the model on the given problem.
We can learn them from training data.
Usually, they are a part of the model.

If the model contains a fixed number of internal parameters, it is called parametric. Otherwise, we can classify it as non-parametric.

Examples of internal parameters are as follows:

Weights of artificial neural networks (ANNs)
Support vector values for SVM models
Polynomial coefficients for linear regression or logistic regression

On the other hand, hyperparameters have the following characteristics:

They are used to configure algorithms that estimate model parameters.
The practitioner usually specifies them.
Their estimation is often based on using heuristics.
They are specific to a concrete modeling problem.

It is hard to know the best values for a model's hyperparameters for a specific problem. Also, practitioners usually need to perform additional research on how to tune required hyperparameters so that a model or a training algorithm behaves in the best way. Practitioners use rules of thumb, copying values from similar projects, as well as special techniques such as grid search for hyperparameter estimation.

Examples of hyperparameters are as follows:

C and sigma parameters used in the SVM algorithm for a classification quality configuration
The learning rate parameter that is used in the neural network training process to configure algorithm convergence
The k value that is used in the KNN algorithm to configure the number of neighbors

Model parameter estimation

Model parameter estimation usually uses some optimization algorithm. The speed and quality of the resulting model can significantly depend on the optimization algorithm chosen. Research on optimization algorithms is a popular topic in industry, as well as in academia. ML often uses optimization techniques and algorithms based on the optimization of a loss function. A function that evaluates how well a model predicts on the data is called a loss function. If predictions are very different from the target outputs, the loss function will return a value that can be interpreted as a bad one, usually a large number. In such a way, the loss function penalizes an optimization algorithm when it moves in the wrong direction. So, the general idea is to minimize the value of the loss function to reduce penalties. There is no one universal loss function for optimization algorithms. Different factors determine how to choose a loss function. Examples of such factors are as follows:

Specifics of the given problem—for example, if it is a regression or a classification model
Ease of calculating derivatives
Percentage of outliers in the dataset

In ML, the term optimizer is used to define an algorithm that connects a loss function and a technique for updating model parameters in response to the values of the loss function. So, optimizers tune ML models to predict target values for new data in the most accurate way by fitting model parameters. There are many optimizers: Gradient Descent, Adagrad, RMSProp, Adam, and others. Moreover, developing new optimizers is an active area of research. For example, there is the ML and Optimization research group at Microsoft (located in Redmond) whose research areas include combinatorial optimization, convex and non-convex optimization, and their application in ML and AI. Other companies in the industry also have similar research groups; there are many publications from Facebook Research, Amazon Research, and OpenAI groups.

An overview of linear regression

Consider an example of the real-world supervised ML algorithm called linear regression. In general, linear regression is an approach for modeling a target value (dependent value) based on an explanatory value (independent value). This method is used for forecasting and finding relationships between values. We can classify regression methods by the number of inputs (independent variables) and the type of relationship between the inputs and outputs (dependent variables).

Simple linear regression is the case where the number of independent variables is 1, and there is a linear relationship between the independent (x) and dependent (y) variable.

Linear regression is widely used in different areas, such as scientific research, where it can describe relationships between variables, as well as in applications within industry, such as a revenue prediction. For example, it can estimate a trend line that represents the long-term movement in the stock price time-series data. It tells whether the interest value of in a specific dataset has increased or decreased over the given period, as illustrated in the following screenshot:

If we have one input variable (independent variable) and one output variable (dependent variable) the regression is called simple, and we use the term simple linear regression for it. With multiple independent variables, we call this multiple linear regression or multivariable linear regression. Usually, when we are dealing with real-world problems, we have a lot of independent variables, so we model such problems with multiple regression models. Multiple regression models have a universal definition that covers other types, so even simple linear regression is often defined using the multiple regression definition.

Solving linear regression tasks with different libraries

Assume that we have a dataset, , so that we can express the linear relation between y and x with mathematical formula in the following way:

Here, p is the dimension of the independent variable, and T denotes the transpose, so that is the inner product between vectors and β. Also, we can rewrite the previous expression in matrix notation, as follows:

The preceding matrix notation can be explained as follows:

y: This is a vector of observed target values.
x: This is a matrix of row-vectors, , which are known as explanatory or independent values.
ß: This is a (p+1) dimensional parameters vector.
ε: This is called an error term or noise. This variable captures all other factors that influence the y dependent variable other than the regressors.

When we are considering simple linear regression, p is equal to 1, and the equation will look like this:

The goal of the linear regression task is to find parameter vectors that satisfy the previous equation. Usually, there is no exact solution to such a system of linear equations, so the task is to estimate parameters that satisfy these equations with some assumptions. One of the most popular estimation approaches is one based on the principle of least squares: minimizing the sum of the squares of the differences between the observed dependent variable in the given dataset and those predicted by the linear function. This is called the ordinary least squares (OLS) estimator. So, the task can be formulated with the following formula:

In the preceding formula, the objective function S is given by the following matrix notation:

This minimization problem has a unique solution, in the case that the p columns of the x matrix are linearly independent. We can get this solution by solving the normal equation, as follows:

Linear algebra libraries can solve such equations directly with an analytical approach, but it has one significant disadvantage—computational cost. In the case of large dimensions of y and x, requirements for computer memory amount and computational time are too big to solve real-world tasks.

So, usually, this minimization task is solved with iterative approaches. Gradient descent (GD) is an example of such an algorithm. GD is a technique based on the observation that if the function is defined and is differentiable in a neighborhood of a point , then decreases fastest when it goes in the direction of the negative gradient of S at the point .

We can change our objective function to a form more suitable for an iterative approach. We can use the mean squared error (MSE) function, which measures the difference between the estimator and the estimated value, as illustrated here:

In the case of the multiple regression, we take partial derivatives for this function for each of x components, as follows:

So, in the case of the linear regression, we take the following derivatives:

The whole algorithm has the following description:

Initialize β with zeros.
Define a value for the learning rate parameter that controls how much we are adjusting parameters during the learning procedure.
Calculate the following values of β:

Repeat steps 1-3 for a number of times or until the MSE value reaches a reasonable amount.

The previously described algorithm is one of the simplest supervised ML algorithms. We described it with the linear algebra concepts we introduced earlier in the chapter. Later, it became more evident that almost all ML algorithms use linear algebra under the hood. The following samples show the higher-level API in different linear algebra libraries for solving the linear regression task, and we provide them to show how libraries can simplify the complicated math used underneath. We will give the details of the APIs used in these samples in the following chapters.

Solving linear regression tasks with Eigen

There are several iterative methods for solving problems of the form in the Eigen library. The LeastSquaresConjugateGradient class is one of them, which allows us to solve linear regression problems with the conjugate gradient algorithm. The ConjugateGradient algorithm can converge more quickly to the function's minimum than regular GD but requires that matrix A is positively defined to guarantee numerical stability. The LeastSquaresConjugateGradient class has two main settings: the maximum number of iterations and a tolerance threshold value that is used as a stopping criteria as an upper bound to the relative residual error, as illustrated in the following code block:

typedef float DType;
using Matrix = Eigen::Matrix<DType, Eigen::Dynamic, Eigen::Dynamic>;
int n = 10000;
Matrix x(n,1);
Matrix y(n,1);
Eigen::LeastSquaresConjugateGradient<Matrix> gd;
gd.setMaxIterations(1000);
gd.setTolerance(0.001) ;
gd.compute(x);
auto b = dg.solve(y);

For new x inputs, we can predict new y values with matrices operations, as follows:

Eigen::Matrixxf new_x(5, 2);
new_x << 1, 1, 1, 2, 1, 3, 1, 4, 1, 5;
auto new_y = new_x.array().rowwise() * b.transpose().array();

Also, we can calculate parameter's b vector (the linear regression task solution) by solving the normal equation directly, as follows:

auto b = (x.transpose() * x).ldlt().solve(x.transpose() * y);

Solving linear regression tasks with Shogun

Shogun is an open source ML library that provides a wide range of unified ML algorithms. The Shogun library has the CLinearRidgeRegression class for solving simple linear regression problems. This class solves problems with standard Cholesky matrix decomposition in a noniterative way, as illustrated in the following code block:

auto x = some<CDenseFeatures<float64_t>>(x_values);
auto y= some<CRegressionLabels>(y_values); // real-valued labels
float64_t tau_regularization = 0.0001;
auto lr = some<CLinearRidgeRegression>(tau_regularization, nullptr, nullptr); // regression model with regularization
lr->set_labels(y);
r->train(x)

For new x inputs, we can predict new y values in the following way:

auto new_x = some<CDenseFeatures<float64_t>>(new_x_values);
auto y_predict = lr->apply_regression(new_x);

Also, we can get the calculated parameters (the linear regression task solution) vector, as follows:

auto weights = lr->get_w();

Moreover, we can calculate the value of MSE, as follows:

auto y_predict = lr->apply_regression(x);
auto eval = some<CMeanSquaredError>();
auto mse = eval->evaluate(y_predict , y);

Solving linear regression tasks with Shark-ML

The Shark-ML library provides the LinearModel class for representing linear regression problems. There are two trainer classes for this kind of model: the LinearRegression class, which provides analytical solutions, and the LinearSAGTrainer class, which provides a stochastic average gradient iterative method, as illustrated in the following code block:

using namespace shark;
using namespace std;
Data<RealVector> x;
Data<RealVector> y;
RegressionDataset data(x, y);
LinearModel<> model;

LinearRegression trainer;
trainer.train(model, data);

We can get the calculated parameters (the linear regression task solution) vector by running the following code:

auto b = model.parameterVector();

For new x inputs, we can predict new y values in the following way:

Data<RealVector> new_x;
Data<RealVector> prediction = model(new_x);

Also, we can calculate the value of squared error, as follows:

SquaredLoss<> loss;
auto se = loss(y, prediction)

Linear regression with Dlib

The Dlib library provides the krr_trainer class, which can get the template argument of the linear_kernel type to solve linear regression tasks. This class implements direct analytical solving for this type of problem with the kernel ridge regression algorithm, as illustrated in the following code block:

std::vector<matrix<double>> x;
std::vector<float> y;
krr_trainer<KernelType> trainer;
trainer.set_kernel(KernelType());
decision_function<KernelType> df = trainer.train(x, y);

For new x inputs, we can predict new y values in the following way:

std::vector<matrix<double>> new_x;
for (auto& v : x) {
    auto prediction = df(v);
    std::cout << prediction << std::endl;
}

Filter reviews by

All

Amazon verified reviews

Karl Mueller Feb 15, 2023

While not a huge problem this book really needs the supplied Docker environment for the examples to work properly.I initially tried to set up the environment myself in my base Linux installation and found that some of the tools used in the book are difficult to find, difficult to compile, etc.Previously I knew nothing about Docker, but it wasn't difficult to learn and it is a useful system to know.It does raise the question about how useful some of the tools can be if they can only ever exist properly in the Docker environment provided with the book. Apart from that I found the book very useful for moving my ML knowledge developed in MATLAB, across to C++ which is the main language I use for development.

Amazon Verified review

Kindle Customer Dec 24, 2020

While Python normally does the job just fine when it comes to handling ML and more general analytics tasks, I have wanted for a long time to work on these kinds of problems using C++. Unfortunately, it has been very difficult to get started because of a severe lack of educational resources out there. Luckily, this book has finally filled that gap for me.What I really like about the book is that the author has put together a series of very complete examples for each method being discussed. Every program reads in an actual csv file with the data (as opposed to using some form of random number generation to create a toy example), puts it into the right format to be used with the given implementation of an ML method and then puts together a data set that one can use as output. As someone who has not had much experience with C++ outside a classroom setting, I found this extremely helpful, and it has made the material immediately applicable to my work in real life.The book covers just the right amount of theory in each chapter as well before diving into the C++ implementation, making the material accessible to developers who are relatively new to data science (which, as I understand, is actually the main target audience).

Robin T. Wernick Feb 08, 2021

Python has hijacked the Machine Learning territory over the last few years since 2014. This leaves the 'C' languages without a comparable foothold in this arena until this book was published. This book covers the gaping void between the 'C' language trained programmers and the Python Machine Language world. It has the same mathematical introduction theory, but counters with a set of code libraries that work with C++.This book will allow the C++ programmer to expand his programming scope without having to rewrite his entire code base in Python and learn a whole new programming language. Not only will it save enormous amounts of time, but it will also provide and give usage detail for a compatible PyTorch Deep Learning library for C++code use. Now the high performance world of GPU programming is available with a tensor interface to C++ programmers.

Matthew Emerick Jun 15, 2020

Disclaimer: The publisher asked me to review this book and gave me a review copy. I promise to be 100% honest in how I feel about this book, both the good and the less so.Personal Background: My first programming language after I started university was C++, followed by C. I'm glad to see that C++ can be used for ML problems, though I do understand that Python can be the easier choice. I try to keep in mind, however, that most if not all Python ML libraries are written in C/C++ to make it run faster.OverviewTo get the most out of this book, I would recommend that you have at least an intermediate competency of C++ and some basic knowledge of machine learning. The former is far more valuable than the later, in this case, as the author assumes that you know C++. There is no hand holding with the code. However, the author does walk you through ML from the basics to a moderate level.What I Like:This book is broken into four overall sections: Overview of Machine Learning, Machine Learning Algorithms, Advanced Examples, and Production and Deployment Challenges. This is an excellent selection of sections that make the overall book better organized. The first section gives a good overview of machine learning (as the title indicates), including a basic understanding of the math involved, data preproccessing, and general rundown of the considerations for choosing which ML technique you should use.The second section gives all the major ML algorithms that a junior ML developer will need. The book focuses on supervised and unsupervised ML, which is most of what you'll see in a business setting. This section finishes with a chapter on Ensemble Learning, where you use multiple ML algorithms to give you better results. The advanced examples mix and match some other algorithms to give you a basic understanding and a starting point for learning more. The final section looks at model deployment and mobile and cloud considerations. If you're new to machine learning and wish to use C++, this is a book book for it. Especially valuable are the Further Reading sections at the end of every chapter.What I Don't Like:When looking at the code, it was very different from the C++ code I learn nearly two decades ago. With the use of C++17, I faced a steep learning curve to use the code examples. While not a concern in and of itself, the first reference to C++17 I could find is on page 41. As someone who knows and enjoys an older version, this made using the code examples more difficult to me. I understand and agree with using a more recent version of the language, but would have appreciated a warning on the back cover or at least in the preface so that I could do some review first. A book recommendation for learning this version of C++ would have be appreciated as well.In the first chapter, the author divides machine learning up into two categories: supervised and unsupervised learning. While technically correct, there is a third category that doesn't fit well into either one: reinforcement learning. I wouldn't expect the author to delve into that niche sub field, it still should have been mentioned.What I Would Like to See:I really enjoyed this book. It has much to offer anyone with C++ experience. It is well organized and has much useful information. I am very happy to have it as part of my library. I think that a book from this author about C++ ML from Scratch would be interesting.Overall, I give this book a 4.9 out of 5. It's an excellent resource.

George Ford Feb 12, 2023

I originally bought this book with the hopes of being able to get a better grasp on machine learning with c++, since the back cover states: "This book makes machine learning with C++ for beginners easy with its example based approach". It starts off reviewing some of the basics of linear algebra... OK. But then in the next chapter, in an attempt to get you familiar with all of the different libraries, you begin loading data using API's without any background to what those API's do and then how you would use that data.The author tries to familiarize you with a bunch of different libraries, without truly ever really describing the details of any of them. The author will write code to accomplish a task with a given library, and then repeat the same with another library. But, in my opinion, this is done without much insight as to why you are doing what you are doing. Just copying code.The background info on neural networks, although helpful, does not really explain fully how they work, outside of providing the differential equations that are implemented.It is a really tough read to go from cover to cover, and I don't feel like you really grasp much, since too much is trying to be explained with a bunch of tools, but no focus on any given tool.I think it would be much better if someone were to focus on one or two tools (xtensor, libtorch, dlib, etc) and approach the subject in that manner. This way, you are familiarizing yourself with the subject as well as the library you are using

0	1	2	3	4	5
a11	a12	a13	a21	a22	a23

0	1	2	3	4	5
a11	a21	a12	a22	a13	a23

Hands-On Machine Learning with C++: Build, train, and deploy end-to-end machine learning and deep learning pipelines

What do you get with eBook?

Contact Details

Billing Address

Key benefits

Description

Who is this book for?

What you will learn

Product Details

What do you get with eBook?

Contact Details

Billing Address

Product Details

Packt Subscriptions

Frequently bought together

Table of Contents

Recommendations for you

Customer reviews

Filter reviews by

People who bought this also bought

About the authors

FAQs