Summary
In this chapter, we reviewed two new classification techniques: KNN and SVM. The goal was to discover how these techniques work and the differences between them by building and comparing models on a common dataset in order to predict if an individual had diabetes. KNN involved both the unweighted and weighted nearest neighbor algorithms. These did not perform as well as the SVMs in predicting whether an individual had diabetes or not.
We examined how to build and tune both the linear and nonlinear support vector machines using the e1071
package. We used the extremely versatile caret
package to compare the predictive ability of a linear and nonlinear support vector machine and saw that the nonlinear support vector machine with a sigmoid kernel performed the best.
Finally, we touched on how you can use the caret
package to perform a crude feature selection as this is a difficult challenge with a blackbox technique such as SVM. This is a major challenge when using these techniques and...