Support vector classification
We need our data to be linearly separable in order to classify it with a maximal margin classifier. When our data is not linearly separable, we can still use the notion of support vectors that define a margin, but this time, we will allow some examples to be misclassified. Thus, we essentially define a soft margin, in that some of the observations in our dataset can violate the constraint that they need to be at least as far as the margin from the separating hyperplane. It is also important to note that sometimes we may want to use a soft margin even for linearly separable data. The reason for this is in order to limit the degree of overfitting the data. Note that the larger the margin, the more confident we are about our ability to correctly classify new observations, because the classes are further apart from each other in our training data. If we achieve separation using a very small margin, we are less confident about our ability to correctly classify our...