The SEMMA acronym's full form is Sample, Explore, Modify, Model, and Assess. This sequential data mining process is developed by SAS. The SEMMA process has five major phases:
- Sample: In this phase, we identify different databases and merge them. After this, we select the data sample that's sufficient for the modeling process.
- Explore: In this phase, we understand the data, discover the relationships among variables, visualize the data, and get initial interpretations.
- Modify: In this phase, data is prepared for modeling. This phase involves dealing with missing values, detecting outliers, transforming features, and creating new additional features.
- Model: In this phase, the main concern is selecting and applying different modeling techniques, such as linear and logistic regression, backpropagation networks, KNN, support vector machines, decision trees, and Random Forest.
- Assess: In this last phase, the predictive models that have been developed are evaluated using performance evaluation measures.
The following diagram shows this process:
The preceding diagram shows the steps involved in the SEMMA process. SEMMA emphasizes model building and assessment. Now, let's discuss the CRISP-DM process.