Chapter 4: Introduction to Data Grouping
Data grouping is a machine learning application to segment large amounts of data into assigned groups for data points with similar behavior. It is necessary to use the K-means machine learning algorithm because it is very difficult to visualize a large amount of data on a business intelligence chart. Furthermore, when the number of variables is greater than four, we can't make a chart.
The best-case scenario for groups is compact data with a small standard deviation. If we have groups with a large standard deviation, it could mean that they are outliers. Outliers have different behaviors compared with the other groups and could indicate possible suspicious activity such as fraud or poor system performance, which could affect the entire operation in the near future.
The K-means algorithm is the best known of the grouping methods. There are others that could be better than K-means depending on the data. Four examples of classification...