K-nearest neighbors
This algorithm belongs to a particular family called instance-based algorithms (the methodology is called instance-based learning).
It differs from other approaches because it doesn't work with an actual mathematical model. On the contrary, the inference is performed by direct comparison of new samples with existing ones (which are defined as instances). KNN is an approach that can be easily employed to solve clustering, classification, and regression problems (even though, in this case, we are going to consider only the first technique). The main idea behind the clustering algorithm is very simple. Let's consider a data generating process pdata and finite a dataset drawn from this distribution:
Each point has a dimensionality equal to N. We can now introduce a distance function , which in most cases can be generalized with the Minkowski distance:
When p = 2, represents the classical Euclidean distance, that is normally the ...