Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
IBM SPSS Modeler Cookbook

You're reading from   IBM SPSS Modeler Cookbook If you've already had some experience with IBM SPSS Modeler this cookbook will help you delve deeper and exploit the incredible potential of this data mining workbench. The recipes come from some of the best brains in the business.

Arrow left icon
Product type Paperback
Published in Oct 2013
Publisher Packt
ISBN-13 9781849685467
Length 382 pages
Edition 1st Edition
Languages
Concepts
Arrow right icon
Toc

Table of Contents (11) Chapters Close

Preface 1. Data Understanding FREE CHAPTER 2. Data Preparation – Select 3. Data Preparation – Clean 4. Data Preparation – Construct 5. Data Preparation – Integrate and Format 6. Selecting and Building a Model 7. Modeling – Assessment, Evaluation, Deployment, and Monitoring 8. CLEM Scripting A. Business Understanding Index

The IBM SPSS Modeler workbench

This book is about the data mining workbench variously known as Clementine, IBM SPSS Modeler. This and the other workbench-style data mining tools have played a crucial role in making data mining what it now is, that is, a business process (rather than a technical one). The importance of the workbench is twofold.

Firstly, the workbench plays down the technical side of data mining. It simplifies the use of technology through a user interface that allows the user almost always to ignore the deep technical details, whether this means the method of data access, the design of a graph, or the mechanism and tuning of data mining algorithms. Technical details are simplified, and where possible, universal default settings are used so that the users often need not see any options that reveal the underlying technology, let alone understand what they mean.

This is important because it allows business analysts to perform data mining—a business analyst is someone with expert business knowledge and general-purpose analytical knowledge. A business analyst need not have deep knowledge of data mining algorithms or mathematics, and it can even be a disadvantage to have this knowledge because technical details can distract from focusing on the business problem.

Secondly, the workbench records and highlights the way in which business knowledge has been used to analyze the data. This is why most data mining workbenches use a "visual workflow" approach; the workflow constitutes a record of the route from raw data to analysis, and it also makes it extremely easy to change this processing and re-use it in part or in full. Data mining is an interactive process of applying business and analytical knowledge to data, and the data mining workbench is designed to make this easy.

A brief history of the Clementine workbench

During the 1980s, the School of Cognitive and Computing Studies at the University of Sussex developed an Artificial Intelligence programming environment called Poplog. Used for teaching and research, Poplog was characterized by containing several different AI programming languages and many other AI-related packages, including machine-learning modules. From 1983, Poplog was marketed commercially by Systems Designers Limited (later SD-Scicon), and in 1989, a management buyout created a spin-off company called Integral Solutions Ltd (ISL) to market Poplog and related products. A stream of businesses developed within ISL, applying the machine-learning packages in Poplog to organizations' data, in order to understand and predict customer behavior.

In 1993, Colin Shearer (the then Development and Research Director at ISL) invented the Clementine data mining workbench, basing his designs around the data mining projects recently executed by the company and creating the first workbench modules using Poplog. ISL created a data mining division, led by Colin Shearer, to develop, productize, and market Clementine and its associated services; the initial members were Colin Shearer, Tom Khabaza, and David Watkins. This team used Poplog to develop the first version of Clementine, which was launched in June 1994.

Clementine Version 1 would be considered limited by today's standards; the only algorithms provided were decision trees and neural networks, and it had very limited access to databases. However, the fundamental design features of low technical burden on the user and a flexible visual record of the analysis were as much as they are today, and Clementine immediately attracted substantial commercial interest. New versions followed, approximately one major version per year, as shown in the table below. ISL was acquired by SPSS Inc. in December 1998, and SPSS Inc. was acquired by IBM in 2009.

Version

Major new features

1

Decision tree and neural network algorithms, limited database access, and Unix platforms only

2

New Kohonen network and linear regression algorithms, new web graph, improved data manipulation, and supernodes

3

ODBC database access, Unix, and Windows platforms

4

Association Rules and K-means clustering algorithms

5

Scripting, batch execution, external module interface, client-server architecture (Poplog client and C++ server), and the CRISP-DM project tool

6

Logistic regression algorithm, database pushback, and Clementine application templates

7

Java client including many new features, TwoStep clustering, and PCA/Factor analysis algorithms

8

Cluster browser and data audit

9

CHAID and Quest algorithms and interactive decision tree building

10

Anomaly detection and feature selection algorithms

11

Automated modeling, times series and decision list algorithms, and partial automation of data preparation

12

SVM, Bayesian and Cox regression algorithms, RFM, and variable importance charts

13

Automated clustering and data preparation, nearest neighbor algorithm, interactive rule building

14

Boosting and bagging, ensemble browsing, XML data

15

Entity analytics social network analysis, GLMM algorithm

Version 13 was renamed as PASW Modeler, and Version 14 as IBM SPSS Modeler. The selection of major new features described earlier is very subjective; every new version of Clementine included a large number of enhancements and new features. In particular, data manipulation, data access and export, visualization, and the user interface received a great deal of attention throughout. Perhaps the most significant new release was Version 7, where the Clementine client was completely rewritten in Java; this was designed by Sheri Gilley and Julian Clinton, and contained a large number of new features while retaining the essential character of the software. Another very important feature of Clementine from Version 6 onwards was database pushback, the ability to translate Clementine operations into SQL so that they could be executed directly by a database engine without extracting the data first; this was primarily the work of Niall McCarroll and Rob Duncan, and it gave Clementine an unusual degree of scalability compared to other data mining software.

In 1996, ISL collaborated with Daimler-Benz, NCR Teradata, and OHRA to form the "CRISP-DM" consortium, partly funded by a European Union R&D grant in order to create a new data mining methodology, CRISP-DM. The consortium consulted many organizations through its Special Interest Group and released CRISP-DM Version 1.0 in 1999. CRISP-DM has been integrated into the workbench since that time and has been very widely used, sufficiently to justify calling it the industry standard.

The core Clementine analytics are designed to handle structured data—numeric, coded, and string data of the sort typically found in relational databases. However, in Clementine Version 4, a prototype text mining module was produced in collaboration with Brighton University, although not released as a commercial product. In 2002, SPSS acquired LexiQuest, a text mining company, and integrated the LexiQuest text mining technology into a product called Text Mining for Clementine, an add-on module for Version 7. Text mining is accomplished in the workbench by extracting structured data from unstructured (free text) data, and then using the standard features of the workbench to analyze this.

lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image