As predictive models become more pervasive, the need for sharing the models and completing the modeling process leads to formalization of development process and interchangeable formats. In this section, we'll review two de facto standards, one covering data science processes and the other specifying an interchangeable format for sharing models between applications.
Standards and markup languages
CRISP-DM
Cross Industry Standard Process for Data Mining (CRISP-DM) describes a data-mining process commonly used by data scientists in industry. CRISP-DM breaks the data-mining science process into six major phases:
- Business understanding
- Data understanding
- Data preparation
- Modeling
- Evaluation
- Deployment
In the following diagram...