What is CRISP-DM?
CRISP-DM is a tool that is a neutral and industry-nonspecific process model for navigating a data mining project life cycle. It consists of six phases, and within those phases, a total of 24 generic tasks. In the given table, one can see the phases as column headings, and the generic tasks in bold. It is the most widely used process model of its kind. This is especially true of users of Modeler since the software has historically made explicit references to CRISP-DM in the default structure of the project files, but the polls have shown that its popularity extends to many data miners. It was written in the 90s by a consortium of data miners from numerous companies. Its lead authors were from NCR, Daimler Chrysler, and ISL (later bought by SPSS).
This book uses this process model to structure the book but does not address the CRISP-DM content directly. Since the CRISP-DM consortium is nonprofit, the original documents are widely available on the Web, and it would be helpful to read it entirely as part of one's data mining professional development. Naturally, as a cookbook written for users of Modeler, our focus will be on hands-on tasks.
Business understanding, while critical, is not conducive to a recipe-based format. It is such an important topic, which is why it is covered in Appendix, Business Understanding, in prose. Data preparation receives the most of our attention with four chapters. Modeling is covered, in depth, in its own chapter. Since evaluation and deployment often use Modeler in combination with other tools, we have included them in somewhat fewer recipes, but that does not diminish its importance. The final chapter, Modeler Scripting, is not named after a CRISP-DM phase or a task but is included at the end because it has the most advanced recipes.