In this section, we are going to discuss how to segment the customer base into subgroups using a clustering algorithm in R. By the end of this section, we will have built a customer segmentation model using the k-means clustering algorithm. For those readers who would like to use Python, instead of R, for this exercise, see the previous section.
For this exercise, we will be using one of the publicly available datasets from the UCI Machine Learning Repository, which can be found at this link: http://archive.ics.uci.edu/ml/datasets/online+retail. You can follow this link and download the data, which is available in XLSX format, named Online Retail.xlsx. Once you have downloaded this data, you can load it into your RStudio by running the following command:
library(readxl)
#### 1. Load Data ####
df <- read_excel(
path="~/Documents/data-science...