Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds
Arrow up icon
GO TO TOP
Practical Machine Learning with R

You're reading from   Practical Machine Learning with R Define, build, and evaluate machine learning models for real-world applications

Arrow left icon
Product type Paperback
Published in Aug 2019
Publisher Packt
ISBN-13 9781838550134
Length 416 pages
Edition 1st Edition
Languages
Tools
Arrow right icon
Authors (3):
Arrow left icon
Brindha Priyadarshini Jeyaraman Brindha Priyadarshini Jeyaraman
Author Profile Icon Brindha Priyadarshini Jeyaraman
Brindha Priyadarshini Jeyaraman
Ludvig Renbo Olsen Ludvig Renbo Olsen
Author Profile Icon Ludvig Renbo Olsen
Ludvig Renbo Olsen
Monicah Wambugu Monicah Wambugu
Author Profile Icon Monicah Wambugu
Monicah Wambugu
Arrow right icon
View More author details
Toc

Table of Contents (8) Chapters Close

About the Book 1. An Introduction to Machine Learning FREE CHAPTER 2. Data Cleaning and Pre-processing 3. Feature Engineering 4. Introduction to neuralnet and Evaluation Methods 5. Linear and Logistic Regression Models 6. Unsupervised Learning 1. Appendix

Chapter 6: Unsupervised Learning

Activity 20: Perform DIANA, AGNES, and k-means on the Built-In Motor Car Dataset

Solution:

  1. Attach the cluster and factoextra packages:

    library(cluster)

    library(factoextra)

  2. Load the dataset:

    df <- read.csv("mtcars.csv")

  3. Set the row names to the values of the X column (the state names). Remove the X column afterward:

    rownames(df) <- df$X

    df$X <- NULL

    Note

    The row names (states) become a column, X, when you save it as a CSV file. So, we need to change it back, as the row names are used in the plot in step 7.

  4. Remove those rows with missing data and standardize the dataset:

    df <- na.omit(df)

    df <- scale(df)

  5. Implement divisive hierarchical clustering using DIANA. For easy comparison, document the dendrogram output. Feel free to experiment with different distance metrics:

    dv <- diana(df,metric = "manhattan", stand = TRUE)

    plot(dv)

    The output is as follows:

    Figure 6.41: Banner from diana()
    Figure 6.41: Banner from diana()

    The next plot is as follows:

    Figure 6.42: Dendrogram from diana()
    Figure 6.42: Dendrogram from diana()
  6. Implement bottom-up hierarchical clustering using AGNES. Take note of the dendrogram created for comparison purposes later on:

    agn <- agnes(df)

    pltree(agn)

    The output is as follows:

    Figure 6.43: Dendrogram from agnes()
    Figure 6.43: Dendrogram from agnes()
  7. Implement k-means clustering. Use the elbow method to determine the optimal number of clusters:

    fviz_nbclust(mtcars, kmeans, method = "wss") +

        geom_vline(xintercept = 4, linetype = 2) +

        labs(subtitle = "Elbow method")

    The output is as follows:

    Figure 6.44: Optimal clusters using the elbow method
    Figure 6.44: Optimal clusters using the elbow method
  8. Perform k-means clustering with four clusters:

    k4 <- kmeans(df, centers = 4, nstart = 20)

    fviz_cluster(k4, data = df)

    The output is as follows:

    Figure 6.45: k-means with four clusters
    Figure 6.45: k-means with four clusters
  9. Compare the clusters, starting with the smallest one. The following are your expected results for DIANA, AGNES, and k-means, respectively:
Figure 6.46: Dendrogram from running DIANA, cut at 20
Figure 6.46: Dendrogram from running DIANA, cut at 20

If we consider cutting the DIANA tree at height 20, the Ferrari is clustered together with the Ford and the Maserati (the smallest cluster):

Figure 6.47: Dendrogram from agnes, cut at 4
Figure 6.47: Dendrogram from agnes, cut at 4

Meanwhile, cutting the AGNES dendrogram at height 4 results in the Ferrari being clustered with the Mazda RX4, the Mazda RX4 Wag, and the Porsche. k-means clusters the Ferrari with the Mazdas, the Ford, and the Maserati.

Figure 6.48: kmeans clustering
Figure 6.48: kmeans clustering

Clearly, the choice of clustering technique and algorithms results in different clusters being created. It is important to apply some domain knowledge to determine the most valuable end results.

lock icon The rest of the chapter is locked
arrow left Previous Section
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image