Grouping and clustering methods have a very simple goal, and I want you to keep this goal in mind throughout this entire chapter.
The goal of clustering: Group similar things together, while separating dissimilar things.
That is the beginning and end of the motivation, but of course, as with other data mining tasks, the devil is in the details.
So, let's start the discussion by brainstorming what types of mathematical machinery we will need to get this task done right. We will need a quantitative way to describe the following three things:
- Location of group: A way to define where a group is in space that spans multiple dimensions
- Similarity: What it means to be similar and dissimilar to other data points
- Termination Condition: When to stop grouping, preferably without human intervention
If we can find a way to get these three things defined...