Before we start looking at the programming content, let's take a look at clustering models, since we will be using one in our first example.
Clustering seeks to group similar data points together. As a simple example, when there are three data points, each with one column, [1],[2],[6], respectively, we pick one point as the centroid that represents the nearby points; for example, with two centroids, [1.5] and [5], each represents a cluster: one with [1],[2] and another cluster with [6], respectively. These sample clusters can be seen in the following diagram:
When there are two columns for each data point, the distance between the actual data point and the centroid needs to consider the two columns as one data point. We adopt a measurement called Euclidean distance for this.
One of the key challenges of adopting clustering in banking is that it leads to clusters that are too large, which reduces the true...