The KDD acronym stands for knowledge discovery from data or Knowledge Discovery in Databases. Many people treat KDD as one synonym for data mining. Data mining is referred to as the knowledge discovery process of interesting patterns. The main objective of KDD is to extract or discover hidden interesting patterns from large databases, data warehouses, and other web and information repositories. The KDD process has seven major phases:
- Data Cleaning: In this first phase, data is preprocessed. Here, noise is removed, missing values are handled, and outliers are detected.
- Data Integration: In this phase, data from different sources is combined and integrated together using data migration and ETL tools.
- Data Selection: In this phase, relevant data for the analysis task is recollected.
- Data Transformation: In this phase, data is engineered in the required appropriate form for analysis.
- Data Mining: In this phase, data mining techniques are used to discover useful and unknown patterns.
- Pattern Evaluation: In this phase, the extracted patterns are evaluated.
- Knowledge Presentation: After pattern evaluation, the extracted knowledge needs to be visualized and presented to business people for decision-making purposes.
The complete KDD process is shown in the following diagram:
KDD is an iterative process for enhancing data quality, integration, and transformation to get a more improved system. Now, let's discuss the SEMMA process.