Similarity and Data Standardization
For a clustering algorithm to try to find groups of customers, they need some measure of what it means for a customer to be similar or different. In this section, we will learn how to think about how similar two data points are and how to standardize data to prepare it for clustering.
Determining Similarity
In order to use clustering for customer segmentation (to group customers together with other customers who have similar traits), you first have to decide what "similar" means, or in other words, you need to be very specific about defining what kind of customers are similar. The customer traits you use should be those that are most related to the kind of marketing campaigns you would like to do.
Ideally, each feature you choose should have roughly equal importance in your mind in terms of how well it captures something important about the customer. For example, segmenting customers based on the flavor of toothpaste they tend to buy may not make sense if...