You're reading from IBM SPSS Modeler Cookbook If you've already had some experience with IBM SPSS Modeler this cookbook will help you delve deeper and exploit the incredible potential of this data mining workbench. The recipes come from some of the best brains in the business.

Product type Paperback

Published in Oct 2013

Publisher Packt

ISBN-13 9781849685467

Length 382 pages

Edition 1st Edition

Languages

Java

Tools

IBM SPSS

Concepts

Data Mining

Table of Contents (11) Chapters

Preface

1. Data Understanding FREE CHAPTER

2. Data Preparation – Select

3. Data Preparation – Clean

4. Data Preparation – Construct

5. Data Preparation – Integrate and Format

6. Selecting and Building a Model

7. Modeling – Assessment, Evaluation, Deployment, and Monitoring

8. CLEM Scripting

A. Business Understanding

Index

The IBM SPSS Modeler workbench

This book is about the data mining workbench variously known as Clementine, IBM SPSS Modeler. This and the other workbench-style data mining tools have played a crucial role in making data mining what it now is, that is, a business process (rather than a technical one). The importance of the workbench is twofold.

Firstly, the workbench plays down the technical side of data mining. It simplifies the use of technology through a user interface that allows the user almost always to ignore the deep technical details, whether this means the method of data access, the design of a graph, or the mechanism and tuning of data mining algorithms. Technical details are simplified, and where possible, universal default settings are used so that the users often need not see any options that reveal the underlying technology, let alone understand what they mean.

This is important because it allows business analysts to perform data mining—a business analyst is someone with expert business knowledge and general-purpose analytical knowledge. A business analyst need not have deep knowledge of data mining algorithms or mathematics, and it can even be a disadvantage to have this knowledge because technical details can distract from focusing on the business problem.

Secondly, the workbench records and highlights the way in which business knowledge has been used to analyze the data. This is why most data mining workbenches use a "visual workflow" approach; the workflow constitutes a record of the route from raw data to analysis, and it also makes it extremely easy to change this processing and re-use it in part or in full. Data mining is an interactive process of applying business and analytical knowledge to data, and the data mining workbench is designed to make this easy.

A brief history of the Clementine workbench

During the 1980s, the School of Cognitive and Computing Studies at the University of Sussex developed an Artificial Intelligence programming environment called Poplog. Used for teaching and research, Poplog was characterized by containing several different AI programming languages and many other AI-related packages, including machine-learning modules. From 1983, Poplog was marketed commercially by Systems Designers Limited (later SD-Scicon), and in 1989, a management buyout created a spin-off company called Integral Solutions Ltd (ISL) to market Poplog and related products. A stream of businesses developed within ISL, applying the machine-learning packages in Poplog to organizations' data, in order to understand and predict customer behavior.

In 1993, Colin Shearer (the then Development and Research Director at ISL) invented the Clementine data mining workbench, basing his designs around the data mining projects recently executed by the company and creating the first workbench modules using Poplog. ISL created a data mining division, led by Colin Shearer, to develop, productize, and market Clementine and its associated services; the initial members were Colin Shearer, Tom Khabaza, and David Watkins. This team used Poplog to develop the first version of Clementine, which was launched in June 1994.

Clementine Version 1 would be considered limited by today's standards; the only algorithms provided were decision trees and neural networks, and it had very limited access to databases. However, the fundamental design features of low technical burden on the user and a flexible visual record of the analysis were as much as they are today, and Clementine immediately attracted substantial commercial interest. New versions followed, approximately one major version per year, as shown in the table below. ISL was acquired by SPSS Inc. in December 1998, and SPSS Inc. was acquired by IBM in 2009.

Version	Major new features
1	Decision tree and neural network algorithms, limited database access, and Unix platforms only
2	New Kohonen network and linear regression algorithms, new web graph, improved data manipulation, and supernodes
3	ODBC database access, Unix, and Windows platforms
4	Association Rules and K-means clustering algorithms
5	Scripting, batch execution, external module interface, client-server architecture (Poplog client and C++ server), and the CRISP-DM project tool
6	Logistic regression algorithm, database pushback, and Clementine application templates
7	Java client including many new features, TwoStep clustering, and PCA/Factor analysis algorithms
8	Cluster browser and data audit
9	CHAID and Quest algorithms and interactive decision tree building
10	Anomaly detection and feature selection algorithms
11	Automated modeling, times series and decision list algorithms, and partial automation of data preparation
12	SVM, Bayesian and Cox regression algorithms, RFM, and variable importance charts
13	Automated clustering and data preparation, nearest neighbor algorithm, interactive rule building
14	Boosting and bagging, ensemble browsing, XML data
15	Entity analytics social network analysis, GLMM algorithm

Version 13 was renamed as PASW Modeler, and Version 14 as IBM SPSS Modeler. The selection of major new features described earlier is very subjective; every new version of Clementine included a large number of enhancements and new features. In particular, data manipulation, data access and export, visualization, and the user interface received a great deal of attention throughout. Perhaps the most significant new release was Version 7, where the Clementine client was completely rewritten in Java; this was designed by Sheri Gilley and Julian Clinton, and contained a large number of new features while retaining the essential character of the software. Another very important feature of Clementine from Version 6 onwards was database pushback, the ability to translate Clementine operations into SQL so that they could be executed directly by a database engine without extracting the data first; this was primarily the work of Niall McCarroll and Rob Duncan, and it gave Clementine an unusual degree of scalability compared to other data mining software.

In 1996, ISL collaborated with Daimler-Benz, NCR Teradata, and OHRA to form the "CRISP-DM" consortium, partly funded by a European Union R&D grant in order to create a new data mining methodology, CRISP-DM. The consortium consulted many organizations through its Special Interest Group and released CRISP-DM Version 1.0 in 1999. CRISP-DM has been integrated into the workbench since that time and has been very widely used, sufficiently to justify calling it the industry standard.

The core Clementine analytics are designed to handle structured data—numeric, coded, and string data of the sort typically found in relational databases. However, in Clementine Version 4, a prototype text mining module was produced in collaboration with Brighton University, although not released as a commercial product. In 2002, SPSS acquired LexiQuest, a text mining company, and integrated the LexiQuest text mining technology into a product called Text Mining for Clementine, an add-on module for Version 7. Text mining is accomplished in the workbench by extracting structured data from unstructured (free text) data, and then using the standard features of the workbench to analyze this.

The rest of the chapter is locked

You're reading from IBM SPSS Modeler Cookbook If you've already had some experience with IBM SPSS Modeler this cookbook will help you delve deeper and exploit the incredible potential of this data mining workbench. The recipes come from some of the best brains in the business.

Table of Contents (11) Chapters

The IBM SPSS Modeler workbench

A brief history of the Clementine workbench

Unlock this book and the full library FREE for 7 days

Personalised recommendations for you