Segmentation
Segmentation
(as shown in the following screenshot) is very different from Classification
. With segmentation or clustering, the main idea is that you do not believe that all the cases in the dataset are similar. That is, you believe there are distinct groups of people, and these groups should be looked at separately. Often, cluster analysis is used in marketing campaigns so that not all customers receive the same ads, but instead receive the appropriate ads. In this scenario, there is no dependent variable, only independent variables that are used to segment the cases.
K-means
is the oldest of the techniques and has been popular and widely available for decades. It uses the distances between cases (on scale variables only) to determine similarity. Here, distance can be thought of quite literally—Euclidian distance is a common method. Cases whose values are close on the scale variables are grouped into clusters in an attempt to find homogeneous subsets. Determining how many clusters...