Data understanding and preparation
Let's start with loading the R packages that we will need for this chapter. As always, make sure that you have installed them first:
> library(cluster) #conduct cluster analysis > library(compareGroups) #build descriptive statistic tables > library(HDclassif) #contains the dataset > library(NbClust) #cluster validity measures > library(sparcl) #colored dendrogram
The dataset is in the HDclassif
package, which we installed. So, we can load the data and examine the structure with the str()
function:
> data(wine) > str(wine) 'data.frame':178 obs. of 14 variables: $ class: int 1 1 1 1 1 1 1 1 1 1 ... $ V1 : num 14.2 13.2 13.2 14.4 13.2 ... $ V2 : num 1.71 1.78 2.36 1.95 2.59 1.76 1.87 2.15 1.64 1.35 ... $ V3 : num 2.43 2.14 2.67 2.5 2.87 2.45 2.45 2.61 2.17 2.27 ... $ V4 : num 15.6 11.2 18.6 16.8 21 15.2 14.6 17.6 14 16 ... $ V5 : int 127 100 101 113 118 112 96 121 97 98 ... $ V6 : num 2.8 2.65 2.8 3.85...