The k-Nearest Neighbors (kNN) algorithm is a supervised learning algorithm that, given a data point, will try to classify it based on its similarity to a set of training examples of known classes. In this recipe, we'll look at taking a dataset, dividing it into a test and train set, and predicting the test classes from a model built on the training set. These sorts of approaches are widely applicable in bioinformatics and can be invaluable in clustering when we have some known examples of our target classes.
Learning groupings within data and classifying with kNN
Getting ready
For this recipe, we'll need a few new packages: caret, class, dplyr, and magrittr. As a dataset, we will use the built-in iris dataset.
...