Introduction
Looking for patterns in our dataset is a large part of data analysis. Of course, a dataset of any complexity is too much for the human mind to see patterns in, so we rely on computers, statistics, and machine learning to augment our insights.
In this chapter, we'll take a look at a number of methods used to cluster and classify data. Depending on the nature of the data and the question(s) we're trying to answer, different algorithms will be more or less useful. For instance, while K-Means clustering is great for clustering numeric datasets, it's poorly suited for working with nominal data.
Most of the recipes in this chapter will use the Weka machine learning and data mining library (http://www.cs.waikato.ac.nz/ml/weka/). This is a full-featured library, which is used to analyze data using many different procedures and algorithms. It includes a more complete set of these algorithms than Incanter, which we've been using a lot so far. We'll start by seeing...