In this section, we are going to discuss how to segment the customer base into subgroups using the clustering algorithm in Python. By the end of this section, we will have built a customer segmentation model using the k-means clustering algorithm. We will be mainly using the pandas, matplotlib, and scikit-learn packages to analyze, visualize, and build machine learning models. For those readers, who would like to use R, instead of Python, for this exercise, you can skip to the next section.
For this exercise, we will be using one of the publicly available datasets from the UCI Machine Learning Repository, which can be found at this link: http://archive.ics.uci.edu/ml/datasets/online+retail. You can follow this link and download the data, which is available in XLSX format, named Online Retail.xlsx. Once you have downloaded this data, you can load...