Chapter 7. Classification Analysis
In the context of data analysis, the main idea of classification is the partition of a dataset into labeled subsets. If the dataset is a table in a database, then this partitioning could amount to no more than the addition of a new attribute (that is, a new table column) whose domain (that is, range of values) is a set of labels.
For example, we might have the table of 16 fruits shown in Table 7-1:
The last column, labeled Sweet
, is a nominal attribute that can be used to classify fruit: either it's sweet or it isn't. Presumably, every fruit type that exists could be classified by this attribute. If you see an unknown fruit in the grocery store and wonder whether it is sweet, a classification algorithm could predict the answer, based upon the other attributes that you can observe {Size, Color, Surface}
. We will see how to do that later in the chapter.
A classification algorithm is a...