Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds
IBM SPSS Modeler Cookbook
IBM SPSS Modeler Cookbook

IBM SPSS Modeler Cookbook: If you've already had some experience with IBM SPSS Modeler this cookbook will help you delve deeper and exploit the incredible potential of this data mining workbench. The recipes come from some of the best brains in the business.

Arrow left icon
Profile Icon Keith McCormick Profile Icon Abbott
Arrow right icon
$37.99 $42.99
Full star icon Full star icon Full star icon Full star icon Half star icon 4.4 (20 Ratings)
eBook Oct 2013 382 pages 1st Edition
eBook
$37.99 $42.99
Paperback
$70.99
Subscription
Free Trial
Renews at $19.99p/m
Arrow left icon
Profile Icon Keith McCormick Profile Icon Abbott
Arrow right icon
$37.99 $42.99
Full star icon Full star icon Full star icon Full star icon Half star icon 4.4 (20 Ratings)
eBook Oct 2013 382 pages 1st Edition
eBook
$37.99 $42.99
Paperback
$70.99
Subscription
Free Trial
Renews at $19.99p/m
eBook
$37.99 $42.99
Paperback
$70.99
Subscription
Free Trial
Renews at $19.99p/m

What do you get with eBook?

Product feature icon Instant access to your Digital eBook purchase
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
OR
Modal Close icon
Payment Processing...
tick Completed

Billing Address

Table of content icon View table of contents Preview book icon Preview Book

IBM SPSS Modeler Cookbook

Chapter 2. Data Preparation – Select

In this chapter, we will cover:

  • Using the Feature Selection node creatively to remove or decapitate perfect predictors
  • Running a Statistics node on an anti-join to evaluate the potential missing data
  • Evaluating the use of sampling for speed
  • Removing redundant variables using correlation matrices
  • Selecting variables using the CHAID Modeling node
  • Selecting variables using the Means node
  • Selecting variables using single-antecedent Association Rules

Introduction

This chapter focuses on just the first task, Select, of the data preparation phase:

Decide on the data to be used for analysis. Criteria include relevance to the data mining goals, quality, and technical constraints such as limits on data volume or data types. Note that data selection covers selection of attributes (columns) as well as selection of records (rows) in a table.

Ideally, data mining empowers business people to discover valuable patterns in large quantities of data, to develop useful models and integrate them into the business quickly and easily. The name data mining suggests that large quantities of data will be involved, that the object is to extract rare and elusive bits of the data, and that data mining calls for working with data in bulk—no sampling.

New data miners are often struck by how much selection and sampling is actually done. For some, the stereotypical data miner dives in and looks at everything. It is unclear how such an unfocused search would...

Using the Feature Selection node creatively to remove or decapitate perfect predictors

In this recipe, we will identify perfect or near perfect predictors in order to insure that they do not contaminate our model. Perfect predictors earn their name by being correct 100 percent of the time, usually indicating circular logic and not a prediction of value. It is a common and serious problem.

When this occurs we have accidentally allowed information into the model that could not possibly be known at the time of the prediction. Everyone 30 days late on their mortgage receives a late letter, but receiving a late letter is not a good predictor of lateness because their lateness caused the letter, not the other way around.

The rather colorful term decapitate is borrowed from the data miner Dorian Pyle. It is a reference to the fact that perfect predictors will be found at the top of any list of key drivers ("caput" means head in Latin). Therefore, to decapitate is to remove the variable...

Running a Statistics node on anti-join to evaluate the potential missing data

There is typically some data loss when various data tables are integrated. Although we won't discuss data integration until a later chapter, it is important to gauge what (and how much) is lost at this stage. Financial variables are usually aggregated in very different ways for the financial planner and the data miner. It is critical that the data miner periodically translate the data of the data miner back into the form that middle and senior management will recognize so that they can better communicate.

The data miner deals with transactions and individual customer data, the language of individual rows of data. The manager speaks, generally, the language of spreadsheets: regions, product lines, months rolled up into aggregated cells in Excel.

On a project, we once discovered that a small percentage of missing rows represented a larger fraction of revenue than average—much larger actually. We suddenly...

Evaluating the use of sampling for speed

Modern data mining practice is somewhat different from the ideal. Data miners certainly do develop valuable models that are used in the business and many have massive resources of data to mine, even more data than might have been foreseen a generation ago. But not all data miners meet the profile of a business user, someone whose primary work responsibility is not data analysis and who is not trained in, or concerned with, statistical methods. Nor does the modern data miner shy away from sampling.

In practice, it has been difficult to make discoveries and build models quickly when working with massive quantities of data. Although data mining tools may be designed to streamline the process, it still takes longer for each operation to complete on a large amount of data than it would with a smaller quantity. This sampling can be extremely useful.

Getting ready

We will start with a blank stream, and will be using the cup98lrn reduced vars2.txt data set...

Removing redundant variables using correlation matrices

In this recipe we will remove redundant variables by building a correlation matrix that identifies highly correlated variables.

Getting ready

This recipe uses the datafile, nasadata.txt and the stream file, recipe_variableselection_correlations.str.

You will need a copy of Microsoft Excel to visualize the correlation matrix.

How to do it...

To remove redundant variables using correlation matrices:

  1. Open the stream, recipe_variableselection_correlations.str by navigating to File | Open Stream.
  2. Make sure the datafile points to the correct path to the file nasadata.txt.
  3. Open the Type node named Correlation Types. Notice that there are several variables of type continuous whose direction values have been set to Input, and a single continuous variable has its direction set to Target. The variable set to Target can be any variable that won't be an input to the model. If you don't have a good candidate, you can create a random variable and...

Selecting variables using the CHAID Modeling node

In this recipe we will identify and select variables to include as model inputs using the CHAID node.

You will need a copy of Microsoft Excel to visualize and select the chi-square values for each variable.

Getting ready

This recipe uses the datafile cup98lrn_reduced_vars3.sav and the stream recipe_variableselection_chaid.str.

How to do it...

To identify and select variables to include as model inputs using the CHAID node:

  1. Open the stream variableselection_chaid.str by navigating to File | Open Stream and selecting the stream.
  2. Make sure the datafile points to the correct path for the file cup98lrn_reduced_vars3.sav.
  3. Open the Type node named CHAID Types. Notice that there are several variables of type continuous whose direction values have been set to Input, and a single continuous variable has its direction set to Target. The variable set to Target should be the target variable TARGET_B.
  4. Open the node TARGET_B and select the Interactive Model option...

Selecting variables using the Means node

In this recipe we will identify and select variables to include as model inputs using the Means node.

Getting ready

This recipe uses the datafile cup98lrn_reduced_vars3.sav and the stream recipe_variableselection_means.str.

You will need a copy of Microsoft Excel to visualize the list of rules (optional).

How to do it...

To identify and select variables to include as model inputs using the Means node:

  1. Open the stream variableselection_means.str by navigating File | Open Stream.
  2. Make sure the datafile points to the correct path to the file cup98lrn_reduced_vars3.sav.
  3. Open the Means node to look at the options. Note that the grouping variable is our target variable TARGET_B, and the test fields are all the continuous variables of interest as shown in the following figure.
    How to do it...
  4. Run the Means node by clicking on Run.
  5. Inside the output window, click on the Importance column twice so that the variables are sorted in descending order of Importance as shown in the following...

Selecting variables using single-antecedent Association Rules

In this recipe we will identify and select variables to include as model inputs using the Apriori Association Rules node. We will select the top 24 predictors based on Association Rules variable selection. We will use the same KDD Cup 1998 data set, but this version of the data was prepared with the stream Recipe - variable selection apriori data prep.str to create quintile versions of continuous variables. The target variable is the top quintile in donation amounts, TARGET_D between $20 and $200.

Getting ready

This recipe uses the datafile cup98lrn_reduced_vars3_apriori.sav and the stream Recipe - variable selection apriori.str.

You will need a copy of Microsoft Excel to visualize the list of rules.

How to do it...

To identify and select variables to include as model inputs using the Apriori Association Rules node:

  1. Open the stream Recipe - variable selection apriori.str by navigating to File | Open Stream.
  2. Make sure the datafile points...
Left arrow icon Right arrow icon

Key benefits

  • Go beyond mere insight and build models than you can deploy in the day to day running of your business
  • Save time and effort while getting more value from your data than ever before
  • Loaded with detailed step-by-step examples that show you exactly how it's done by the best in the business

Description

IBM SPSS Modeler is a data mining workbench that enables you to explore data, identify important relationships that you can leverage, and build predictive models quickly allowing your organization to base its decisions on hard data not hunches or guesswork. IBM SPSS Modeler Cookbook takes you beyond the basics and shares the tips, the timesavers, and the workarounds that experts use to increase productivity and extract maximum value from data. The authors of this book are among the very best of these exponents, gurus who, in their brilliant and imaginative use of the tool, have pushed back the boundaries of applied analytics. By reading this book, you are learning from practitioners who have helped define the state of the art. Follow the industry standard data mining process, gaining new skills at each stage, from loading data to integrating results into everyday business practices. Get a handle on the most efficient ways of extracting data from your own sources, preparing it for exploration and modeling. Master the best methods for building models that will perform well in the workplace. Go beyond the basics and get the full power of your data mining workbench with this practical guide.

Who is this book for?

If you have had some hands-on experience with IBM SPSS Modeler and now want to go deeper and take more control over your data mining process, this is the guide for you. It is ideal for practitioners who want to break into advanced analytics.

What you will learn

  • Use and understand the industry standard CRISP_DM process for data mining.
  • Assemble data simply, quickly, and correctly using the full power of extraction, transformation, and loading (ETL) tools.
  • Control the amount of time you spend organizing and formatting your data.
  • Develop predictive models that stand up to the demands of real-life applications.
  • Take your modeling to the next level beyond default settings and learn the tips that the experts use.
  • Learn why the best model is not always the most accurate one.
  • Master deployment techniques that put your discoveries to work making the most of your business most critical resources.
  • Challenge yourself with scripting for ultimate control and automation - it s easier than you think!

Product Details

Country selected
Publication date, Length, Edition, Language, ISBN-13
Publication date : Oct 24, 2013
Length: 382 pages
Edition : 1st
Language : English
ISBN-13 : 9781849685474
Vendor :
IBM
Category :
Languages :
Concepts :
Tools :

What do you get with eBook?

Product feature icon Instant access to your Digital eBook purchase
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
OR
Modal Close icon
Payment Processing...
tick Completed

Billing Address

Product Details

Publication date : Oct 24, 2013
Length: 382 pages
Edition : 1st
Language : English
ISBN-13 : 9781849685474
Vendor :
IBM
Category :
Languages :
Concepts :
Tools :

Packt Subscriptions

See our plans and pricing
Modal Close icon
$19.99 billed monthly
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Simple pricing, no contract
$199.99 billed annually
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just $5 each
Feature tick icon Exclusive print discounts
$279.99 billed in 18 months
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just $5 each
Feature tick icon Exclusive print discounts

Frequently bought together


Stars icon
Total $ 109.98
IBM SPSS Modeler Essentials
$38.99
IBM SPSS Modeler Cookbook
$70.99
Total $ 109.98 Stars icon

Table of Contents

10 Chapters
1. Data Understanding Chevron down icon Chevron up icon
2. Data Preparation – Select Chevron down icon Chevron up icon
3. Data Preparation – Clean Chevron down icon Chevron up icon
4. Data Preparation – Construct Chevron down icon Chevron up icon
5. Data Preparation – Integrate and Format Chevron down icon Chevron up icon
6. Selecting and Building a Model Chevron down icon Chevron up icon
7. Modeling – Assessment, Evaluation, Deployment, and Monitoring Chevron down icon Chevron up icon
8. CLEM Scripting Chevron down icon Chevron up icon
A. Business Understanding Chevron down icon Chevron up icon
Index Chevron down icon Chevron up icon

Customer reviews

Top Reviews
Rating distribution
Full star icon Full star icon Full star icon Full star icon Half star icon 4.4
(20 Ratings)
5 star 65%
4 star 20%
3 star 5%
2 star 5%
1 star 5%
Filter icon Filter
Top Reviews

Filter reviews by




Amazon Customer Dec 09, 2014
Full star icon Full star icon Full star icon Full star icon Full star icon 5
Short of paying a fortune to IBM for training. This all you need to get started esp if you have any kind of background with SAS or SPSS
Amazon Verified review Amazon
Steve F. Feb 01, 2015
Full star icon Full star icon Full star icon Full star icon Full star icon 5
Excellent book. SPSS Modeler comes with a ton of really good demos and this book covers everything the demos don't.
Amazon Verified review Amazon
Gordon Curzon Jan 09, 2014
Full star icon Full star icon Full star icon Full star icon Full star icon 5
A must read for all SPSS users even if just for a refresh.Love the way the book is structured.
Amazon Verified review Amazon
Kamau Njenga Mar 09, 2014
Full star icon Full star icon Full star icon Full star icon Full star icon 5
This book is a must have for anyone learning to use SPSS modeler or looking to learn data mining with SPSS. The examples were great and easy to follow.
Amazon Verified review Amazon
Terry Taerum Dec 14, 2013
Full star icon Full star icon Full star icon Full star icon Full star icon 5
In the competitive world of analytical consulting, when getting the job done in a reasonable time is critical, it's helpful to know what others have done to solve similar challenges, particularly when using IBM SPSS Modeler. The text, combined with data samples and example streams delivers the shortest possible distance between problem and solution.
Amazon Verified review Amazon
Get free access to Packt library with over 7500+ books and video courses for 7 days!
Start Free Trial

FAQs

How do I buy and download an eBook? Chevron down icon Chevron up icon

Where there is an eBook version of a title available, you can buy it from the book details for that title. Add either the standalone eBook or the eBook and print book bundle to your shopping cart. Your eBook will show in your cart as a product on its own. After completing checkout and payment in the normal way, you will receive your receipt on the screen containing a link to a personalised PDF download file. This link will remain active for 30 days. You can download backup copies of the file by logging in to your account at any time.

If you already have Adobe reader installed, then clicking on the link will download and open the PDF file directly. If you don't, then save the PDF file on your machine and download the Reader to view it.

Please Note: Packt eBooks are non-returnable and non-refundable.

Packt eBook and Licensing When you buy an eBook from Packt Publishing, completing your purchase means you accept the terms of our licence agreement. Please read the full text of the agreement. In it we have tried to balance the need for the ebook to be usable for you the reader with our needs to protect the rights of us as Publishers and of our authors. In summary, the agreement says:

  • You may make copies of your eBook for your own use onto any machine
  • You may not pass copies of the eBook on to anyone else
How can I make a purchase on your website? Chevron down icon Chevron up icon

If you want to purchase a video course, eBook or Bundle (Print+eBook) please follow below steps:

  1. Register on our website using your email address and the password.
  2. Search for the title by name or ISBN using the search option.
  3. Select the title you want to purchase.
  4. Choose the format you wish to purchase the title in; if you order the Print Book, you get a free eBook copy of the same title. 
  5. Proceed with the checkout process (payment to be made using Credit Card, Debit Cart, or PayPal)
Where can I access support around an eBook? Chevron down icon Chevron up icon
  • If you experience a problem with using or installing Adobe Reader, the contact Adobe directly.
  • To view the errata for the book, see www.packtpub.com/support and view the pages for the title you have.
  • To view your account details or to download a new copy of the book go to www.packtpub.com/account
  • To contact us directly if a problem is not resolved, use www.packtpub.com/contact-us
What eBook formats do Packt support? Chevron down icon Chevron up icon

Our eBooks are currently available in a variety of formats such as PDF and ePubs. In the future, this may well change with trends and development in technology, but please note that our PDFs are not Adobe eBook Reader format, which has greater restrictions on security.

You will need to use Adobe Reader v9 or later in order to read Packt's PDF eBooks.

What are the benefits of eBooks? Chevron down icon Chevron up icon
  • You can get the information you need immediately
  • You can easily take them with you on a laptop
  • You can download them an unlimited number of times
  • You can print them out
  • They are copy-paste enabled
  • They are searchable
  • There is no password protection
  • They are lower price than print
  • They save resources and space
What is an eBook? Chevron down icon Chevron up icon

Packt eBooks are a complete electronic version of the print edition, available in PDF and ePub formats. Every piece of content down to the page numbering is the same. Because we save the costs of printing and shipping the book to you, we are able to offer eBooks at a lower cost than print editions.

When you have purchased an eBook, simply login to your account and click on the link in Your Download Area. We recommend you saving the file to your hard drive before opening it.

For optimal viewing of our eBooks, we recommend you download and install the free Adobe Reader version 9.