Understanding the multivariate dataset
A multivariate dataset is defined as a set of multiple observations (attributes) associated with different aspects of a phenomenon. In this chapter, we will use a multivariate dataset, which is the result of a chemical analysis of wines that grew in three different cultivars from the same area in Italy. The Wine dataset is available in the UC Irvine Machine Learning Repository and can be freely downloaded from http://archive.ics.uci.edu/ml/datasets/Wine. This dataset includes physicochemical data from white and red wine from the north of Portugal in order to find quality levels. The dataset includes 13 features with no missing data, and all the features are numerical or real values.
The complete list of features is listed here:
- Alcohol
- Malic acid
- Ash
- Alkalinity of ash
- Magnesium
- Total phenols
- Flavanoids
- Nonflavanoid phenols
- Proanthocyanins
- Color intensity
- Hue
- OD280/OD315 of diluted wines
- Proline
The classes in the dataset are ordered and not balanced; this means...