Introducing Estimators
The Estimator
class, just like the Transformer
class, was introduced in Spark 1.3. The Estimators, as the name suggests, estimate the parameters of a model or, in other words, fit the models to data.
In this recipe, we will introduce two models: the linear SVM acting as a classification model, and a linear regression model predicting the forest elevation.
Here is a list of all of the Estimators, or machine learning models, available in the ML module:
- Classification:
LinearSVC
is an SVM model for linearly separable problems. The SVM's kernel has the form (a hyperplane), whereÂ
 is the coefficients (or a normal vector to the hyperplane),Â
 is the records, and b is the offset.
LogisticRegression
is a default, go-to classification model for linearly separable problems. It uses a logit function to calculate the probability of a record being a member of a particular class.DecisionTreeClassifier
is a decision tree-based model used for classification purposes. It builds a binary...