Quantifying similarities
The reliability of the grouping created by clustering algorithms is based on the assumption that we can accurately quantify the similarities or closeness between various data points in the problem space. This is done by using various distance measures. The following are three of the most popular methods that are used to quantify similarities:
- Euclidean distance measure
- Manhattan distance measure
- Cosine distance measure
Let's look at these distance measures in more detail.