In order to cluster data points together, we need to define and utilize some distance or similarity that quantitatively defines the closeness between data points. Choosing this measure is an essential part of every clustering project because it directly influences how the clusters are generated. Clusters resulting from the use of one similarity measure might be very different from those resulting from the use of another similarity measure.
The most common and simple of these distance measures is the Euclidean distance or the squared Euclidean distance. This is simply the straight line distance between two data points in your space of features (you might remember this distance as it was also used in our kNN example in Chapter 5, Classification) or quantity squared. However, there are a whole host of other, sometimes more complicated, distance metrics...