Let's consider the following dataset:
data:image/s3,"s3://crabby-images/1d273/1d27352b9248900af39590eeae2c81ee9f32fc0a" alt=""
We define affinity, a metric function of two arguments with the same dimensionality m. The most common metrics (also supported by scikit-learn) are:
- Euclidean or L2:
data:image/s3,"s3://crabby-images/6fa7b/6fa7b8b0e79d8648cd1479f1cf0b89f5d2970c8b" alt=""
- Manhattan (also known as City Block) or L1:
data:image/s3,"s3://crabby-images/f88ad/f88ad98fff9359632ff20b23d67af6363d68a3aa" alt=""
- Cosine distance:
data:image/s3,"s3://crabby-images/13543/13543004860e3b5fb01c069d38fb1e1c03f61184" alt=""
The Euclidean distance is normally a good choice, but sometimes it's useful to a have a metric whose difference with the Euclidean one gets larger and larger. The Manhattan metric has this property; to show it, in the following figure there's a plot representing the distances from the origin of points belonging to the line y = x:
data:image/s3,"s3://crabby-images/95dd4/95dd44ec1c676b0a7cee105c7083843592e30d2a" alt=""
The cosine distance, instead, is useful when we need a distance proportional to the angle between two vectors. If the direction is the same, the distance is null, while it is maximum when the angle is equal to 180° (meaning opposite...