Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
IBM SPSS Modeler Essentials
IBM SPSS Modeler Essentials

IBM SPSS Modeler Essentials: Effective techniques for building powerful data mining and predictive analytics solutions

Arrow left icon
Profile Icon Keith McCormick Profile Icon Jesus Salcedo
Arrow right icon
$19.99 per month
Full star icon Full star icon Full star icon Full star icon Full star icon 5 (1 Ratings)
Paperback Dec 2017 238 pages 1st Edition
eBook
$9.99 $29.99
Paperback
$38.99
Subscription
Free Trial
Renews at $19.99p/m
Arrow left icon
Profile Icon Keith McCormick Profile Icon Jesus Salcedo
Arrow right icon
$19.99 per month
Full star icon Full star icon Full star icon Full star icon Full star icon 5 (1 Ratings)
Paperback Dec 2017 238 pages 1st Edition
eBook
$9.99 $29.99
Paperback
$38.99
Subscription
Free Trial
Renews at $19.99p/m
eBook
$9.99 $29.99
Paperback
$38.99
Subscription
Free Trial
Renews at $19.99p/m

What do you get with a Packt Subscription?

Free for first 7 days. $19.99 p/m after that. Cancel any time!
Product feature icon Unlimited ad-free access to the largest independent learning library in tech. Access this title and thousands more!
Product feature icon 50+ new titles added per month, including many first-to-market concepts and exclusive early access to books as they are being written.
Product feature icon Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.
Product feature icon Thousands of reference materials covering every tech concept you need to stay up to date.
Subscribe now
View plans & pricing
Table of content icon View table of contents Preview book icon Preview Book

IBM SPSS Modeler Essentials

Chapter 1. Introduction to Data Mining and Predictive Analytics

IBM SPSS Modeler is an interactive data mining workbench composed of multiple tools and technologies to support the entire data mining process. In this first chapter, readers will be introduced to the concepts of data mining, CRISP-DM, which is a recipe for doing data mining the right way, and a case study outlining the data mining process. The chapter topics are as follows:

  • Introduction to data mining
  • CRISP-DM overview
  • The data mining process (as a case study)

Introduction to data mining


In this chapter, we will place IBM SPSS Modeler and its use in a broader context. Modeler was developed as a tool to perform data mining. Although the phrase predictive analytics is more common now, when Modeler was first developed in the 1990s, this type of analytics was almost universally called data mining. The use of the phrase data mining has evolved a bit since then to emphasize the exploratory aspect, especially in the context of big data and sometimes with a particular emphasis on the mining of private data that has been collected. This will not be our use of the term. Data mining can be defined in the following way:

Data mining is the search of data, accumulated during the normal course of doing business, in order to find and confirm the existence of previously unknown relationships that can produce positive and verifiable outcomes through the deployment of predictive models when applied to new data.

Several points are worth emphasizing:

  • The data is not new
  • The data that can solve the problem was not collected solely to perform data mining
  • The data miner is not testing known relationships (neither hypotheses nor hunches) against the data
  • The patterns must be verifiable
  • The resulting models must be capable of something useful
  • The resulting models must actually work when deployed on new data

In the late 1990s, a process was developed called the Cross Industry Standard Process for Data Mining (CRISP-DM). We will be drawing heavily from that tradition in this chapter, and CRISP-DM can be a powerful way to organize your work in Modeler. It is because of our use of this process in organizing this book's material that prompts us to use the term data mining. It is worth noting that the team that first developed Modeler, originally called Clementine, and the team that wrote CRISP-DM have some members in common.

CRISP-DM overview


The CRISP-DM is considered to be the de facto standard for conducting a data mining project. Starting with the Business Understanding phase and ending with the Deployment phase, this six-phase process has a total of 24 tasks. It is important to not get by with just focusing on the highest level of the phases, since it is well worth the effort to familiarize yourself with all of the 24 tasks. The diagram shown next illustrates the six phases of the CRISP-DM process model and the following pages will discuss each of these phases:

Business Understanding

The Business Understanding phase is focused on good problem definition and ensuring that you are solving the business's problem. You must begin from a business perspective and business knowledge, and proceed by converting this knowledge into a data mining problem definition. You will not be performing the actual Business Understanding in Modeler, as such, but Modeler allows you to organize supporting material such as word documents and PowerPoint presentations as part of a Modeler project file. You don't need to organize this material in a project file, but you do need to remember to do a proper job at this phase. For more detailed information on each task within a phase, refer to the CRISP-DM document itself. It is free and readily available on the internet.

The four tasks in this phase are:

  • Determine business objectives
  • Assess situation
  • Determine data mining goals
  • Produce project plan

Data Understanding

Modeler has numerous resources for exploring your data in preparation for the other phases. We will demonstrate a number of these in Chapter 3, Importing Data into ModelerChapter 4, Data Quality and Exploration; and Chapter 8, Looking for Relationships Between Fields. The Data Understanding phase includes activities for getting familiar with the data as well as data collection and data quality. The four Data Understanding tasks are:

  • Collect initial data
  • Describe data
  • Explore data
  • Verify data quality

Data Preparation

The Data Preparation phase covers all activities to construct the final dataset (the data that will be fed into the modeling tool(s)) from the initial raw data. Data Preparation is often described as the most labor-intensive phase for the data analyst. It is terribly important that Data Preparation is done well, and a substantial amount of this book is dedicated to it. We cover cleaning, selecting, integrating, and constructing data, in Chapter 5Cleaning and Selecting Data; Chapter 6,Combining Data Files; and Chapter 7, Deriving New Fields, respectively. However, a book dedicated to the basics of data mining can really only start you on your journey when it comes to Data Preparation, since there are so many ways in which you can improve and prepare data. When you are ready for a more advanced treatment of this topic, there are two resources that will go into Data Preparation in much more depth, and both have extensive Modeler software examples: The IBM SPSS Modeler Cookbook (Packt Publishing) and Effective Data Preparation (Cambridge University Press).

The five Data Preparation tasks are:

  • Select data
  • Clean data
  • Construct data
  • Integrate data
  • Format data

Modeling

The Modeling phase is probably what you expect it to be—the phase where the modeling algorithms move to the forefront. In many ways, this is the easiest phase, as the algorithms do a lot of the work if you have done an excellent job on the prior phases and you've done a good job translating the business problem into a data mining problem. Despite the fact that the algorithms are doing the heavy lifting in this phase, it is generally considered the most intimidating; it is understandable why. There are an overwhelming number of algorithms to choose from. Even in a well-curated workbench such as Modeler, there are dozens of choices. Open source options such as R have hundreds of choices. While this book is not an algorithms guide, and even though it is impossible to offer a chapter on each algorithm, Chapter 9Introduction to Modeling Options in IBM SPSS Modeler should be very helpful in understanding, at a high level, what options are available in Modeler. Also, in Chapter 10, Decision Tree Models we go through a thorough demonstration of one modeling technique, decision trees, to orient you to modeling in Modeler.

The four tasks in this phase are:

  • Select modeling technique
  • Generate test design
  • Build model
  • Assess model

Evaluation

At this stage in the project you have built a model (or models) that appears to be of high quality, from a data analysis perspective. Before proceeding to final deployment of the model, it is important to more thoroughly evaluate the model—to be certain it properly achieves the business objectives.

Evaluation is frequently confused with model assessment—the last task of the Modeling phase. Assess model is all about the data analysis perspective and includes metrics such as model accuracy. The authors of CRISP-DM considered calling this phase business evaluation because it has to be conducted in the language of the business and using the metrics of the business as indicators of success. Given the nature of this book, and its emphasis on the point and click operation of Modeler, there will be virtually no opportunity to practice this phase, but in real world projects it is a critical phase.

The three tasks in this phase are:

  • Evaluate results
  • Review process
  • Determine next steps

Deployment

Creation of the model is generally not the end of the project. Depending on the requirements, the Deployment phase can be as simple as generating a report or as complex as implementing a repeatable data mining process. Given the software focus of this book and the spirit of sticking to the basics, we will really only cover using models for the scoring of new data. Real world deployment is much more complex and a complex deployment can more than double the length of a project. Modeler's capabilities in this area go far beyond what we will be able to show in this book. The final chapter of this book, Chapter 11, Model Assessment and Scoring, briefly talks about some of these issues.

However, it is not unusual for the deployment team to be different than the modeling team, and the responsibility may fall to team members with more of an IT focus. The IBM software stack offers dedicated tools for complex deployment scenarios. IBM Collaboration and Deployment Services has such advanced features.

The four tasks in the Deployment phase are:

  • Plan deployment
  • Plan monitoring and maintenance
  • Produce final report
  • Review project

Learning more about CRISP-DM

Here are five great resources to learn more about CRISP-DM:

The data mining process (as a case study)


As Chapter 9Introduction to Modeling Options in IBM SPSS Modeler will illustrate, there are many different types of data mining projects. For example, you may wish to create customer segments based on products purchased or service usage, so that you can develop targeted advertising campaigns. Or you may want to determine where to better position products in your store, based on customer purchase patterns. Or you may want to predict which students will drop out of school, so that you can provide additional services before this happens.

In this book, we will be using a dataset where we are trying to predict which people have incomes above or below $50,000. We may be trying to do this because we know that people with incomes above $50,000 are much more likely to purchase our products, given that previous work found that income was the most important predictor regarding product purchase. The point is that regardless of the actual data that we are using, the principles that we will be showing apply to an infinite number of data mining problems; whether you are trying to determine which customers will purchase a product, or when you will need to replace an elevator, or how many hotels rooms will be booked on a given date, or what additional complications might occur during surgery, and so on.

As was mentioned previously, Modeler supports the entire data mining process. The figure shown next illustrates exactly how Modeler can be used to compartmentalize each aspect of the CRISP-DM process model:

In Chapter 2The Basics of Using IBM SPSS Modeler, you will become familiar with the Modeler graphic user interface. In this chapter, we will be using screenshots to illustrate how Modeler represents various data mining activities. Therefore the following figures in this chapter are just providing an overview of how different tasks will look within Modeler, so for the moment do not worry about how each image was created, since you will see exactly how to create each of these in later chapters.

First and foremost, every data mining project will need to begin with well-defined business objectives. This is crucial for determining what you are trying to accomplish or learn from a project, and how to translate this into data mining goals. Once this is done, you will need to assess the current business situation and develop a project plan that is reasonable given the data and time constraints.

Once business and data mining objectives are well defined, you will need to collect the appropriate data. Chapter 3, Importing Data into Modeler will focus on how to bring data into Modeler. Remember that data mining typically uses data that was collected during the normal course of doing business, therefore it is going to be crucial that the data you are using can really address the business and data mining goals:

Once you have data, it is very important to describe and assess its quality. Chapter 4Data Quality and Exploration will focus on how to assess data quality using the Data Audit node:

Once the Data Understanding phase has been completed, it is time to move on to the Data Preparation phase. The Data Preparation phase is by far the most time consuming and creative part of a data mining project. This is because, as was mentioned previously, we are using data that was collected during the normal course of doing business, therefore the data will not be clean, it will have errors, it will include information that is not relevant, it will have to be restructured into an appropriate format, and you will need to create many new variables that extract important information. Thus, due to the importance of this phase, we have devoted several chapters to addressing these issues. Chapter 5Cleaning and Selecting Data will focus on how to select the appropriate cases, by using the Select node, and how to clean data by using the Distinct and Reclassify nodes:

Chapter 6, Combining Data Files will continue to focus on the Data Preparation phase by using both the Append and Merge nodes to integrate various data files:

Finally, Chapter 7Deriving New Fields will focus on constructing additional fields by using the Derive node:

At this point we will be ready to begin exploring relationships within the data. In Chapter 8Looking for Relationships Between Fields we will use the Distribution, Matrix, Histogram, Means, Plot, and Statistics nodes to uncover and understand simple relationships between variables:

Once the Data Preparation phase has been completed, we will move on to the Modeling phase. Chapter 9Introduction to Modeling Options in IBM SPSS Modeler will introduce the various types of models available in Modeler and then provide an overview of the predictive models. It will also discuss how to select a modeling technique. Chapter 10Decision Tree Models will cover the theory behind decision tree models and focus specifically on how to build a CHAID model. We will also use a Partition node to generate a test design; this is extremely important because only through replication can we determine whether we have a verifiable pattern:

Chapter 11Model Assessment and Scoring is the final chapter in this book and it will provide readers with the opportunity to assess and compare models using the Analysis node. The Evaluation node will also be introduced as a way to evaluate model results:

Finally, we will spend some time discussing how to score new data and export those results to another application using the Flat File node:

Summary


In this chapter, you were introduced to the notion of data mining and the CRISP-DM process model. You were also provided with an overview of the data mining process, along with previews of what to expect in the upcoming chapters.

In the next chapter you will learn about the different components of the Modeler graphic user interface. You also learn how to build streams. Finally, you will be introduced to various help options.

Left arrow icon Right arrow icon
Download code icon Download Code

Key benefits

  • Get up–and-running with IBM SPSS Modeler without going into too much depth.
  • Identify interesting relationships within your data and build effective data mining and predictive analytics solutions
  • A quick, easy–to-follow guide to give you a fundamental understanding of SPSS Modeler, written by the best in the business

Description

IBM SPSS Modeler allows users to quickly and efficiently use predictive analytics and gain insights from your data. With almost 25 years of history, Modeler is the most established and comprehensive Data Mining workbench available. Since it is popular in corporate settings, widely available in university settings, and highly compatible with all the latest technologies, it is the perfect way to start your Data Science and Machine Learning journey. This book takes a detailed, step-by-step approach to introducing data mining using the de facto standard process, CRISP-DM, and Modeler’s easy to learn “visual programming” style. You will learn how to read data into Modeler, assess data quality, prepare your data for modeling, find interesting patterns and relationships within your data, and export your predictions. Using a single case study throughout, this intentionally short and focused book sticks to the essentials. The authors have drawn upon their decades of teaching thousands of new users, to choose those aspects of Modeler that you should learn first, so that you get off to a good start using proven best practices. This book provides an overview of various popular data modeling techniques and presents a detailed case study of how to use CHAID, a decision tree model. Assessing a model’s performance is as important as building it; this book will also show you how to do that. Finally, you will see how you can score new data and export your predictions. By the end of this book, you will have a firm understanding of the basics of data mining and how to effectively use Modeler to build predictive models.

Who is this book for?

This book is ideal for those who are new to SPSS Modeler and want to start using it as quickly as possible, without going into too much detail. An understanding of basic data mining concepts will be helpful, to get the best out of the book.

What you will learn

  • • Understand the basics of data mining and familiarize yourself with Modeler's visual programming interface
  • • Import data into Modeler and learn how to properly declare metadata
  • • Obtain summary statistics and audit the quality of your data
  • • Prepare data for modeling by selecting and sorting cases, identifying and removing duplicates, combining data files, and modifying and creating fields
  • • Assess simple relationships using various statistical and graphing techniques
  • • Get an overview of the different types of models available in Modeler
  • • Build a decision tree model and assess its results
  • • Score new data and export predictions

Product Details

Country selected
Publication date, Length, Edition, Language, ISBN-13
Publication date : Dec 26, 2017
Length: 238 pages
Edition : 1st
Language : English
ISBN-13 : 9781788291118
Category :
Concepts :
Tools :

What do you get with a Packt Subscription?

Free for first 7 days. $19.99 p/m after that. Cancel any time!
Product feature icon Unlimited ad-free access to the largest independent learning library in tech. Access this title and thousands more!
Product feature icon 50+ new titles added per month, including many first-to-market concepts and exclusive early access to books as they are being written.
Product feature icon Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.
Product feature icon Thousands of reference materials covering every tech concept you need to stay up to date.
Subscribe now
View plans & pricing

Product Details

Publication date : Dec 26, 2017
Length: 238 pages
Edition : 1st
Language : English
ISBN-13 : 9781788291118
Category :
Concepts :
Tools :

Packt Subscriptions

See our plans and pricing
Modal Close icon
$19.99 billed monthly
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Simple pricing, no contract
$199.99 billed annually
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just $5 each
Feature tick icon Exclusive print discounts
$279.99 billed in 18 months
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just $5 each
Feature tick icon Exclusive print discounts

Frequently bought together


Stars icon
Total $ 175.97
IBM SPSS Modeler Cookbook
$70.99
IBM SPSS Modeler Essentials
$38.99
Data Analysis with IBM SPSS Statistics
$65.99
Total $ 175.97 Stars icon
Banner background image

Table of Contents

11 Chapters
Introduction to Data Mining and Predictive Analytics Chevron down icon Chevron up icon
The Basics of Using IBM SPSS Modeler Chevron down icon Chevron up icon
Importing Data into Modeler Chevron down icon Chevron up icon
Data Quality and Exploration Chevron down icon Chevron up icon
Cleaning and Selecting Data Chevron down icon Chevron up icon
Combining Data Files Chevron down icon Chevron up icon
Deriving New Fields Chevron down icon Chevron up icon
Looking for Relationships Between Fields Chevron down icon Chevron up icon
Introduction to Modeling Options in IBM SPSS Modeler Chevron down icon Chevron up icon
Decision Tree Models Chevron down icon Chevron up icon
Model Assessment and Scoring Chevron down icon Chevron up icon

Customer reviews

Rating distribution
Full star icon Full star icon Full star icon Full star icon Full star icon 5
(1 Ratings)
5 star 100%
4 star 0%
3 star 0%
2 star 0%
1 star 0%
Em Mar 28, 2018
Full star icon Full star icon Full star icon Full star icon Full star icon 5
Great introduction to learning IBM SPSS Modeler and data mining.
Amazon Verified review Amazon
Get free access to Packt library with over 7500+ books and video courses for 7 days!
Start Free Trial

FAQs

What is included in a Packt subscription? Chevron down icon Chevron up icon

A subscription provides you with full access to view all Packt and licnesed content online, this includes exclusive access to Early Access titles. Depending on the tier chosen you can also earn credits and discounts to use for owning content

How can I cancel my subscription? Chevron down icon Chevron up icon

To cancel your subscription with us simply go to the account page - found in the top right of the page or at https://subscription.packtpub.com/my-account/subscription - From here you will see the ‘cancel subscription’ button in the grey box with your subscription information in.

What are credits? Chevron down icon Chevron up icon

Credits can be earned from reading 40 section of any title within the payment cycle - a month starting from the day of subscription payment. You also earn a Credit every month if you subscribe to our annual or 18 month plans. Credits can be used to buy books DRM free, the same way that you would pay for a book. Your credits can be found in the subscription homepage - subscription.packtpub.com - clicking on ‘the my’ library dropdown and selecting ‘credits’.

What happens if an Early Access Course is cancelled? Chevron down icon Chevron up icon

Projects are rarely cancelled, but sometimes it's unavoidable. If an Early Access course is cancelled or excessively delayed, you can exchange your purchase for another course. For further details, please contact us here.

Where can I send feedback about an Early Access title? Chevron down icon Chevron up icon

If you have any feedback about the product you're reading, or Early Access in general, then please fill out a contact form here and we'll make sure the feedback gets to the right team. 

Can I download the code files for Early Access titles? Chevron down icon Chevron up icon

We try to ensure that all books in Early Access have code available to use, download, and fork on GitHub. This helps us be more agile in the development of the book, and helps keep the often changing code base of new versions and new technologies as up to date as possible. Unfortunately, however, there will be rare cases when it is not possible for us to have downloadable code samples available until publication.

When we publish the book, the code files will also be available to download from the Packt website.

How accurate is the publication date? Chevron down icon Chevron up icon

The publication date is as accurate as we can be at any point in the project. Unfortunately, delays can happen. Often those delays are out of our control, such as changes to the technology code base or delays in the tech release. We do our best to give you an accurate estimate of the publication date at any given time, and as more chapters are delivered, the more accurate the delivery date will become.

How will I know when new chapters are ready? Chevron down icon Chevron up icon

We'll let you know every time there has been an update to a course that you've bought in Early Access. You'll get an email to let you know there has been a new chapter, or a change to a previous chapter. The new chapters are automatically added to your account, so you can also check back there any time you're ready and download or read them online.

I am a Packt subscriber, do I get Early Access? Chevron down icon Chevron up icon

Yes, all Early Access content is fully available through your subscription. You will need to have a paid for or active trial subscription in order to access all titles.

How is Early Access delivered? Chevron down icon Chevron up icon

Early Access is currently only available as a PDF or through our online reader. As we make changes or add new chapters, the files in your Packt account will be updated so you can download them again or view them online immediately.

How do I buy Early Access content? Chevron down icon Chevron up icon

Early Access is a way of us getting our content to you quicker, but the method of buying the Early Access course is still the same. Just find the course you want to buy, go through the check-out steps, and you’ll get a confirmation email from us with information and a link to the relevant Early Access courses.

What is Early Access? Chevron down icon Chevron up icon

Keeping up to date with the latest technology is difficult; new versions, new frameworks, new techniques. This feature gives you a head-start to our content, as it's being created. With Early Access you'll receive each chapter as it's written, and get regular updates throughout the product's development, as well as the final course as soon as it's ready.We created Early Access as a means of giving you the information you need, as soon as it's available. As we go through the process of developing a course, 99% of it can be ready but we can't publish until that last 1% falls in to place. Early Access helps to unlock the potential of our content early, to help you start your learning when you need it most. You not only get access to every chapter as it's delivered, edited, and updated, but you'll also get the finalized, DRM-free product to download in any format you want when it's published. As a member of Packt, you'll also be eligible for our exclusive offers, including a free course every day, and discounts on new and popular titles.