SVM is also used widely for large-scale classification (that is, binary as well as multinomial) tasks. Besides, it is also a linear ML method, as described in Chapter 1, Analyzing Insurance Severity Claim. The linear SVM algorithm outputs an SVM model, where the loss function used by SVM can be defined using the hinge loss, as follows:
The linear SVMs in Spark are trained with an L2 regularization, by default. However, it also supports L1 regularization, by which the problem itself becomes a linear program.
Now, suppose we have a set of new data points x; the model makes predictions based on the value of wTx. By default, if wTx≥0, then the outcome is positive, and negative otherwise.
Now that we already know the SVMs working principle, let's start using the Spark-based implementation of SVM. Let's start...