Case study
We'll return to some material from an earlier chapter and apply some careful testing to be sure we've got a good, workable implementation. Back in Chapter 3, When Objects Are Alike, we looked at the distance computations that are part of the k-nearest neighbors classifier. In that chapter, we looked at several computations that produced slightly different results:
- Euclidean distance: This is the direct line from one sample to another.
- Manhattan distance: This follows streets-and-avenues around a grid (like the city of Manhattan), adding up the steps required along a series of straight-line paths.
- Chebyshev distance: This is the largest of the streets-and-avenues distances.
- Sorensen distance: This is a variation of the Manhattan distance that weights nearby steps more heavily than distant steps. It tends to magnify small distances, making more subtle discriminations.
These algorithms all produce distinct results...