In this recipe, we'll look at both the debate and mechanics of k-means for outlier detection. It can be useful to isolate some types of errors, but care should be taken when using it.
Using k-means for outlier detection
Getting ready
We'll use k-means to do outlier detection on a cluster of points. It's important to note that there are many camps when it comes to outliers and outlier detection. On one hand, we're potentially removing points that were generated by the data-generating process by removing outliers. On the other hand, outliers can be due to a measurement error or some other outside factor.
This is the most credence we'll give to the debate. The rest of this recipe is about finding outliers...