Modeling and evaluation
Having created our data frame, df
, we can begin to develop the clustering algorithms. We will start with hierarchical and then try our hand at k-means. After this, we will need to manipulate our data a little bit to demonstrate how to incorporate mixed data with Gower and Random Forest.
Hierarchical clustering
To build a hierarchical cluster model in R, you can utilize the hclust()
function in the base stats
package. The two primary inputs needed for the function are a distance matrix and the clustering method. The distance matrix is easily done with the dist()
function. For the distance, we will use Euclidean distance. A number of clustering methods are available, and the default for hclust()
is the complete linkage.We will try this, but I also recommend Ward's linkage method. Ward's method tends to produce clusters with a similar number of observations.
The complete linkage method results in the distance between any two clusters that is the maximum distance between...