At the heart of machine learning are the various algorithms used to solve complex problems. As mentioned in the introduction, this book will cover five algorithms:
- Binary classification
- Regression
- Anomaly detection
- Clustering
- Matrix factorization
Each will be the focus of a chapter later in the book, but for now, let's get a quick overview of them.
Binary classification
One of the easiest algorithms to understand is binary classification. Binary classification is a supervised machine learning algorithm. As the name implies, the output of a model trained with a binary classification algorithm will return a true or false conviction (as in 0 or 1). Problems best suited to a binary classification model include determining whether a comment is hateful or whether a file is malicious. ML.NET provides several binary classification model algorithms, which we will cover in Chapter 4, Classification Model, along with a working example of determining whether a file is malicious or not.
Regression
Another powerful yet easy-to-understand algorithm is regression. Regression is another supervised machine learning algorithm. Regression algorithms return a real value as opposed to a binary algorithm or ones that return from a set of specific values. You can think of regression algorithms as an algebra equation solver where there are a number of known values and the goal is to predict the one unknown value. Some examples of problems best suited to regression algorithms are predicting attrition, weather forecasting, stock market predictions, and house pricing, to name a few.
In addition, there is a subset of regression algorithms called logistic regression models. Whereas a traditional linear regression algorithm, as described earlier, returns the predicted value, a logistic regression model will return the probability of the outcome occurring.
ML.NET provides several regression model algorithms, which we will cover in Chapter 3, Regression Model.
Anomaly detection
Anomaly detection, as the name implies, looks for unexpected events in the data submitted to the model. Data for this algorithm, as you can probably guess, requires data over a period of time. Anomaly detection in ML.NET looks at both spikes and change points. Spikes, as the name implies, are temporary, whereas change points are the starting points of a longer change.
ML.NET provides an anomaly detection algorithm, which we will cover in Chapter 6, Anomaly Detection Model.
Clustering
Clustering algorithms are unsupervised algorithms and offer a unique solution to problems where finding the closest match to related items is the desired solution. During the training of the data, the data is grouped based on the features, and then during the prediction, the closest match is chosen. Some examples of the use of clustering algorithms include file type classification and predicting customer choices.
ML.NET uses the k-means algorithm specifically, which we will deep dive into in Chapter 5, Clustering Model.
Matrix factorization
Last but not least, the matrix factorization algorithm provides a powerful and easy-to-use algorithm for providing recommendations. This algorithm is tailored to problems where historical data is available and the problem to solve is predicting a selection from that data, such as movie or music predictions. Netflix's movie suggestion system uses a form of matrix factorization for its suggestions about what movies it thinks you will enjoy.
We will cover matrix factorization in detail in Chapter 7, Matrix Factorization Model.