Using K-means for Predictive Analytics
K-means is a clustering algorithm that tries to cluster related data points together. However, we should know its working principle and mathematical operations.
How K-means Works
Suppose we have n data points, xi, i = 1...n, that need to be partitioned into k clusters. Now that the target here is to assign a cluster to each data point, K-means aims to find the positions, μi, i=1...k, of the clusters that minimize the distance from the data points to the cluster. Mathematically, the K-means algorithm tries to achieve the goal by solving an equation that is an optimization problem:
In the previous equation, ci is a set of data points, which when assigned to cluster i andis the Euclidean distance to be calculated (we will explain why we should use this distance measurement shortly). Therefore, we can see that the overall clustering operation using K-means is not a trivial one, but a NP-hard optimization problem. This also means that the K-means algorithm...