Defining the Clustering Problem
We shall define the clustering problem so that we will be able to find similarities between our data points. For instance, suppose we have a dataset that consists of points. Clustering helps us understand this structure by describing how these points are distributed.
Let's look at an example of data points in a two-dimensional space in Figure 5.1:
Now, have a look, at Figure 5.2. It is evident that there are three clusters:
The three clusters were easy to detect because the points are close to one another. Here, you can see that clustering determines the data points that are close to each other. You may have also noticed that the data points M
1, O
1, and N
1 do not belong to any cluster; these are the outlier points. The clustering algorithm you build should be prepared...