Chapter 1. Data Mining Patterns
A common use of data mining is to detect patterns or rules in data.
The points of interest are the non-obvious patterns that can only be detected using a large dataset. The detection of simpler patterns, such as market basket analysis for purchasing associations or timings, has been possible for some time. Our interest in R programming is in detecting unexpected associations that can lead to new opportunities.
Some patterns are sequential in nature, for example, predicting faults in systems based on past results that are, again, only obvious using large datasets. These will be explored in the next chapter.
This chapter discusses the use of R to discover patterns in datasets' various methods:
- Cluster analysis: This is the process of examining your data and establishing groups of data points that are similar. Cluster analysis can be performed using several algorithms. The different algorithms focus on using different attributes of the data distribution, such as distance between points, density, or statistical ranges.
- Anomaly detection: This is the process of looking at data that appears to be similar but shows differences or anomalies for certain attributes. Anomaly detection is used frequently in the field of law enforcement, fraud detection, and insurance claims.
- Association rules: These are a set of decisions that can be made from your data. Here, we are looking for concrete steps so that if we find one data point, we can use a rule to determine whether another data point will likely exist. Rules are frequently used in market basket approaches. In data mining, we are looking for deeper, non-obvious rules that are present in the data.