In this section, we will first look at similarity measures. Then, we will learn about hierarchical clustering.
We talked before about different notions of distance in the Computing distances section. Now, I want to talk about the idea of similarity. A similarity score describes how similar two objects are. There is no universal definition of the properties a similarity score has, but everyone agrees that similar objects have a high similarity score and dissimilar objects have a low similarity score. Dissimilarity is the opposite of similarity, and distance is a form of dissimilarity. Hierarchical clustering uses dissimilarity to form clusters. This means that if we can come up with similarity scores that make sense, we can cluster just about any type of data in a meaningful way.
In this section, I will be focusing on Jaccard similarity, which is related...