Let's consider the following dataset:
![](https://static.packt-cdn.com/products/9781785889622/graphics/assets/c08afb7f-e274-4411-9578-aa3d47c81510.png)
We define affinity, a metric function of two arguments with the same dimensionality m. The most common metrics (also supported by scikit-learn) are:
- Euclidean or L2:
![](https://static.packt-cdn.com/products/9781785889622/graphics/assets/e212c44d-6a95-44da-be7e-1fe94d8c8922.png)
- Manhattan (also known as City Block) or L1:
![](https://static.packt-cdn.com/products/9781785889622/graphics/assets/ef262d3f-a906-43fa-8302-4881ae300301.png)
- Cosine distance:
![](https://static.packt-cdn.com/products/9781785889622/graphics/assets/2cc089a9-2d45-40d4-9510-5562e7dbf47f.png)
The Euclidean distance is normally a good choice, but sometimes it's useful to a have a metric whose difference with the Euclidean one gets larger and larger. The Manhattan metric has this property; to show it, in the following figure there's a plot representing the distances from the origin of points belonging to the line y = x:
![](https://static.packt-cdn.com/products/9781785889622/graphics/assets/bf7f3963-edb7-44b5-8fd7-2cd189f4aa6f.png)
The cosine distance, instead, is useful when we need a distance proportional to the angle between two vectors. If the direction is the same, the distance is null, while it is maximum when the angle is equal to 180° (meaning opposite...