Finding an optimal number of groups for one variable
The first task to solve grouping statistics is to find out the optimal number of groups for our data. Remember these facts by looking at Figure 5.1. Minimize the distance of each group point to its centroid or group average.
The optimal distance is a small standard deviation result of the group data. Data that is at a large distance from the group centroid is an outlier. This means that we need to further research these points because they could represent risky behavior.
Knowing these facts, look at Figure 5.1 and see how difficult it is to decide, at a glance, how many groups have the optimal sales per product and the number of absent hours due to sickness for a human resources case study:
To get the optimal number of groups, we need the K-means elbow algorithm chart. Choose the number where the curve starts to get flat...