Regression
Regression is a supervised learning technique that helps us learn the correlation between a continuous output parameter called Label and a set of input parameters called Features. Regression produces machine learning models that predict a continuous label, given a feature vector. The concept of regression can be best explained using the following diagram:
In the preceding diagram, the scatterplot represents data points spread across a two-dimensional space. The linear regression algorithm, being a parametric learning algorithm, assumes that the learning function will have a linear form. Thus, it learns the coefficients that are required to represent a straight line that approximately fits the data points on the scatterplot.
Spark MLlib has distributed and scalable implementations of a few prominent regression algorithms, such as linear regression, decision trees, random forests, and gradient boosted trees...