Answers
Here are the answers to the preceding questions:
- No. We need to do a visual inspection of the data to guess the possible number of groups for the data. Then, we run the K-means elbow function to get the optimal number of groups of the data.
- We choose the number where the curve starts to flatten.
- The following are the parameters to execute the K-means function:
- Number of groups to process
- The range of the input data
- The range to put the results returned by the K-means function
- Use a pivot table and chart analysis with the minimum, maximum, and centroid values for each group. If the maximum and minimum values have a large distance from the average (centroid), that means that the group has scattered values.
- Scattered values with a large distance from the centroid are possible outliers that need further research. One approach could be to do another K-means process for this group to create subgroups to improve the data classification.