K-Means Clustering
K-means clustering is a very common unsupervised learning technique with a wide range of applications. It is powerful because it is conceptually relatively simple, scales to very large datasets, and tends to work well in practice. In this section, you will learn the conceptual foundations of k-means clustering, how to apply k-means clustering to data, and how to deal with high-dimensional data (that is, data with many different variables) in the context of clustering.
K-means clustering is an algorithm that tries to find the best way of grouping data points into k different groups, where k is a parameter given to the algorithm. For now, we will choose k arbitrarily. We will revisit how to choose k in practice in the next chapter. The algorithm then works iteratively to try to find the best grouping. There are two steps to this algorithm:
- The algorithm begins by randomly selecting k points in space to be the centroids of the clusters. Each data point is then...