scikit-learn implementation
In order to allow the model to have a more flexible separating hyperplane, all scikit-learn implementations are based on a simple variant that includes so-called slack variables in the function to minimize:
In this case, the constraints become:
The introduction of the slack variables allows us to create a flexible margin so that some vectors belonging to a class can also be found in the opposite part of the hyperspace and can be included in the model training. The strength of this flexibility can be set using the parameter C. Small values (close to zero) bring about very hard margins, while values greater than or equal to 1 allow more and more flexibility (also increasing the misclassification rate). The right choice of C is not immediate, but the best value can be found automatically by using a grid search as seen in the previous chapters. In our examples, we keep the default value of 1.
Linear classification
Our first example is based on a linear SVM, as described...