The datasets we have used so far have been described in terms of features. In the previous chapter, we used a transaction-centric dataset. However, ultimately this was just a different format for representing feature-based data.
There are many other types of datasets, including text, images, sounds, movies, or even real objects. Most data mining algorithms rely on having numerical or categorical features. This means we need a way to represent these types before we input them into the data mining algorithm. We call this representation a model.
In this chapter, we will discuss how to extract numerical and categorical features, and choose the best features when we do have them. We will discuss some common patterns and techniques for extracting features. Choosing your model appropriately is critically important to the outcome of the data mining exercise, more so than the choice...