Summary
In this chapter, we learned about classification using k-nearest neighbors. Unlike many classification algorithms, kNN does not do any learning. It simply stores the training data verbatim. Unlabeled test examples are then matched to the most similar records in the training set using a distance function, and the unlabeled example is assigned the label of its neighbors.
In spite of the fact that kNN is a simple algorithm, it is capable of tackling extremely complex tasks, such as identifying cancerous masses. In a few simple lines of R code, we were able to correctly identify whether a mass was malignant or benign 98 percent of the time.
In the next chapter, we will examine a classification method that uses probability to estimate the likelihood that an observation falls into certain categories. It will be interesting to compare how this approach differs from kNN. Later on, in Chapter 9, Finding Groups of Data – Clustering with k-means, we will learn about a close relative to kNN, which...