Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds
IBM SPSS Modeler Cookbook
IBM SPSS Modeler Cookbook

IBM SPSS Modeler Cookbook: If you've already had some experience with IBM SPSS Modeler this cookbook will help you delve deeper and exploit the incredible potential of this data mining workbench. The recipes come from some of the best brains in the business.

Arrow left icon
Profile Icon Keith McCormick Profile Icon Abbott
Arrow right icon
$19.99 per month
Full star icon Full star icon Full star icon Full star icon Half star icon 4.4 (20 Ratings)
Paperback Oct 2013 382 pages 1st Edition
eBook
€37.99 €42.99
Paperback
€53.99
Subscription
Free Trial
Renews at $19.99p/m
Arrow left icon
Profile Icon Keith McCormick Profile Icon Abbott
Arrow right icon
$19.99 per month
Full star icon Full star icon Full star icon Full star icon Half star icon 4.4 (20 Ratings)
Paperback Oct 2013 382 pages 1st Edition
eBook
€37.99 €42.99
Paperback
€53.99
Subscription
Free Trial
Renews at $19.99p/m
eBook
€37.99 €42.99
Paperback
€53.99
Subscription
Free Trial
Renews at $19.99p/m

What do you get with a Packt Subscription?

Free for first 7 days. $19.99 p/m after that. Cancel any time!
Product feature icon Unlimited ad-free access to the largest independent learning library in tech. Access this title and thousands more!
Product feature icon 50+ new titles added per month, including many first-to-market concepts and exclusive early access to books as they are being written.
Product feature icon Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.
Product feature icon Thousands of reference materials covering every tech concept you need to stay up to date.
Subscribe now
View plans & pricing
Table of content icon View table of contents Preview book icon Preview Book

IBM SPSS Modeler Cookbook

Chapter 2. Data Preparation – Select

In this chapter, we will cover:

  • Using the Feature Selection node creatively to remove or decapitate perfect predictors
  • Running a Statistics node on an anti-join to evaluate the potential missing data
  • Evaluating the use of sampling for speed
  • Removing redundant variables using correlation matrices
  • Selecting variables using the CHAID Modeling node
  • Selecting variables using the Means node
  • Selecting variables using single-antecedent Association Rules

Introduction

This chapter focuses on just the first task, Select, of the data preparation phase:

Decide on the data to be used for analysis. Criteria include relevance to the data mining goals, quality, and technical constraints such as limits on data volume or data types. Note that data selection covers selection of attributes (columns) as well as selection of records (rows) in a table.

Ideally, data mining empowers business people to discover valuable patterns in large quantities of data, to develop useful models and integrate them into the business quickly and easily. The name data mining suggests that large quantities of data will be involved, that the object is to extract rare and elusive bits of the data, and that data mining calls for working with data in bulk—no sampling.

New data miners are often struck by how much selection and sampling is actually done. For some, the stereotypical data miner dives in and looks at everything. It is unclear how such an unfocused search would...

Using the Feature Selection node creatively to remove or decapitate perfect predictors

In this recipe, we will identify perfect or near perfect predictors in order to insure that they do not contaminate our model. Perfect predictors earn their name by being correct 100 percent of the time, usually indicating circular logic and not a prediction of value. It is a common and serious problem.

When this occurs we have accidentally allowed information into the model that could not possibly be known at the time of the prediction. Everyone 30 days late on their mortgage receives a late letter, but receiving a late letter is not a good predictor of lateness because their lateness caused the letter, not the other way around.

The rather colorful term decapitate is borrowed from the data miner Dorian Pyle. It is a reference to the fact that perfect predictors will be found at the top of any list of key drivers ("caput" means head in Latin). Therefore, to decapitate is to remove the variable...

Running a Statistics node on anti-join to evaluate the potential missing data

There is typically some data loss when various data tables are integrated. Although we won't discuss data integration until a later chapter, it is important to gauge what (and how much) is lost at this stage. Financial variables are usually aggregated in very different ways for the financial planner and the data miner. It is critical that the data miner periodically translate the data of the data miner back into the form that middle and senior management will recognize so that they can better communicate.

The data miner deals with transactions and individual customer data, the language of individual rows of data. The manager speaks, generally, the language of spreadsheets: regions, product lines, months rolled up into aggregated cells in Excel.

On a project, we once discovered that a small percentage of missing rows represented a larger fraction of revenue than average—much larger actually. We suddenly...

Evaluating the use of sampling for speed

Modern data mining practice is somewhat different from the ideal. Data miners certainly do develop valuable models that are used in the business and many have massive resources of data to mine, even more data than might have been foreseen a generation ago. But not all data miners meet the profile of a business user, someone whose primary work responsibility is not data analysis and who is not trained in, or concerned with, statistical methods. Nor does the modern data miner shy away from sampling.

In practice, it has been difficult to make discoveries and build models quickly when working with massive quantities of data. Although data mining tools may be designed to streamline the process, it still takes longer for each operation to complete on a large amount of data than it would with a smaller quantity. This sampling can be extremely useful.

Getting ready

We will start with a blank stream, and will be using the cup98lrn reduced vars2.txt data set...

Removing redundant variables using correlation matrices

In this recipe we will remove redundant variables by building a correlation matrix that identifies highly correlated variables.

Getting ready

This recipe uses the datafile, nasadata.txt and the stream file, recipe_variableselection_correlations.str.

You will need a copy of Microsoft Excel to visualize the correlation matrix.

How to do it...

To remove redundant variables using correlation matrices:

  1. Open the stream, recipe_variableselection_correlations.str by navigating to File | Open Stream.
  2. Make sure the datafile points to the correct path to the file nasadata.txt.
  3. Open the Type node named Correlation Types. Notice that there are several variables of type continuous whose direction values have been set to Input, and a single continuous variable has its direction set to Target. The variable set to Target can be any variable that won't be an input to the model. If you don't have a good candidate, you can create a random variable and...

Selecting variables using the CHAID Modeling node

In this recipe we will identify and select variables to include as model inputs using the CHAID node.

You will need a copy of Microsoft Excel to visualize and select the chi-square values for each variable.

Getting ready

This recipe uses the datafile cup98lrn_reduced_vars3.sav and the stream recipe_variableselection_chaid.str.

How to do it...

To identify and select variables to include as model inputs using the CHAID node:

  1. Open the stream variableselection_chaid.str by navigating to File | Open Stream and selecting the stream.
  2. Make sure the datafile points to the correct path for the file cup98lrn_reduced_vars3.sav.
  3. Open the Type node named CHAID Types. Notice that there are several variables of type continuous whose direction values have been set to Input, and a single continuous variable has its direction set to Target. The variable set to Target should be the target variable TARGET_B.
  4. Open the node TARGET_B and select the Interactive Model option...

Selecting variables using the Means node

In this recipe we will identify and select variables to include as model inputs using the Means node.

Getting ready

This recipe uses the datafile cup98lrn_reduced_vars3.sav and the stream recipe_variableselection_means.str.

You will need a copy of Microsoft Excel to visualize the list of rules (optional).

How to do it...

To identify and select variables to include as model inputs using the Means node:

  1. Open the stream variableselection_means.str by navigating File | Open Stream.
  2. Make sure the datafile points to the correct path to the file cup98lrn_reduced_vars3.sav.
  3. Open the Means node to look at the options. Note that the grouping variable is our target variable TARGET_B, and the test fields are all the continuous variables of interest as shown in the following figure.
    How to do it...
  4. Run the Means node by clicking on Run.
  5. Inside the output window, click on the Importance column twice so that the variables are sorted in descending order of Importance as shown in the following...

Selecting variables using single-antecedent Association Rules

In this recipe we will identify and select variables to include as model inputs using the Apriori Association Rules node. We will select the top 24 predictors based on Association Rules variable selection. We will use the same KDD Cup 1998 data set, but this version of the data was prepared with the stream Recipe - variable selection apriori data prep.str to create quintile versions of continuous variables. The target variable is the top quintile in donation amounts, TARGET_D between $20 and $200.

Getting ready

This recipe uses the datafile cup98lrn_reduced_vars3_apriori.sav and the stream Recipe - variable selection apriori.str.

You will need a copy of Microsoft Excel to visualize the list of rules.

How to do it...

To identify and select variables to include as model inputs using the Apriori Association Rules node:

  1. Open the stream Recipe - variable selection apriori.str by navigating to File | Open Stream.
  2. Make sure the datafile points...
Left arrow icon Right arrow icon

Key benefits

  • Go beyond mere insight and build models than you can deploy in the day to day running of your business
  • Save time and effort while getting more value from your data than ever before
  • Loaded with detailed step-by-step examples that show you exactly how it's done by the best in the business

Description

IBM SPSS Modeler is a data mining workbench that enables you to explore data, identify important relationships that you can leverage, and build predictive models quickly allowing your organization to base its decisions on hard data not hunches or guesswork. IBM SPSS Modeler Cookbook takes you beyond the basics and shares the tips, the timesavers, and the workarounds that experts use to increase productivity and extract maximum value from data. The authors of this book are among the very best of these exponents, gurus who, in their brilliant and imaginative use of the tool, have pushed back the boundaries of applied analytics. By reading this book, you are learning from practitioners who have helped define the state of the art. Follow the industry standard data mining process, gaining new skills at each stage, from loading data to integrating results into everyday business practices. Get a handle on the most efficient ways of extracting data from your own sources, preparing it for exploration and modeling. Master the best methods for building models that will perform well in the workplace. Go beyond the basics and get the full power of your data mining workbench with this practical guide.

Who is this book for?

If you have had some hands-on experience with IBM SPSS Modeler and now want to go deeper and take more control over your data mining process, this is the guide for you. It is ideal for practitioners who want to break into advanced analytics.

What you will learn

  • Use and understand the industry standard CRISP_DM process for data mining.
  • Assemble data simply, quickly, and correctly using the full power of extraction, transformation, and loading (ETL) tools.
  • Control the amount of time you spend organizing and formatting your data.
  • Develop predictive models that stand up to the demands of real-life applications.
  • Take your modeling to the next level beyond default settings and learn the tips that the experts use.
  • Learn why the best model is not always the most accurate one.
  • Master deployment techniques that put your discoveries to work making the most of your business most critical resources.
  • Challenge yourself with scripting for ultimate control and automation - it s easier than you think!

Product Details

Country selected
Publication date, Length, Edition, Language, ISBN-13
Publication date : Oct 24, 2013
Length: 382 pages
Edition : 1st
Language : English
ISBN-13 : 9781849685467
Vendor :
IBM
Category :
Languages :
Concepts :
Tools :

What do you get with a Packt Subscription?

Free for first 7 days. $19.99 p/m after that. Cancel any time!
Product feature icon Unlimited ad-free access to the largest independent learning library in tech. Access this title and thousands more!
Product feature icon 50+ new titles added per month, including many first-to-market concepts and exclusive early access to books as they are being written.
Product feature icon Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.
Product feature icon Thousands of reference materials covering every tech concept you need to stay up to date.
Subscribe now
View plans & pricing

Product Details

Publication date : Oct 24, 2013
Length: 382 pages
Edition : 1st
Language : English
ISBN-13 : 9781849685467
Vendor :
IBM
Category :
Languages :
Concepts :
Tools :

Packt Subscriptions

See our plans and pricing
Modal Close icon
$19.99 billed monthly
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Simple pricing, no contract
$199.99 billed annually
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just €5 each
Feature tick icon Exclusive print discounts
$279.99 billed in 18 months
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just €5 each
Feature tick icon Exclusive print discounts

Frequently bought together


Stars icon
Total 83.98
IBM SPSS Modeler Essentials
€29.99
IBM SPSS Modeler Cookbook
€53.99
Total 83.98 Stars icon

Table of Contents

10 Chapters
1. Data Understanding Chevron down icon Chevron up icon
2. Data Preparation – Select Chevron down icon Chevron up icon
3. Data Preparation – Clean Chevron down icon Chevron up icon
4. Data Preparation – Construct Chevron down icon Chevron up icon
5. Data Preparation – Integrate and Format Chevron down icon Chevron up icon
6. Selecting and Building a Model Chevron down icon Chevron up icon
7. Modeling – Assessment, Evaluation, Deployment, and Monitoring Chevron down icon Chevron up icon
8. CLEM Scripting Chevron down icon Chevron up icon
A. Business Understanding Chevron down icon Chevron up icon
Index Chevron down icon Chevron up icon

Customer reviews

Top Reviews
Rating distribution
Full star icon Full star icon Full star icon Full star icon Half star icon 4.4
(20 Ratings)
5 star 65%
4 star 20%
3 star 5%
2 star 5%
1 star 5%
Filter icon Filter
Top Reviews

Filter reviews by




Amazon Customer Dec 09, 2014
Full star icon Full star icon Full star icon Full star icon Full star icon 5
Short of paying a fortune to IBM for training. This all you need to get started esp if you have any kind of background with SAS or SPSS
Amazon Verified review Amazon
Steve F. Feb 01, 2015
Full star icon Full star icon Full star icon Full star icon Full star icon 5
Excellent book. SPSS Modeler comes with a ton of really good demos and this book covers everything the demos don't.
Amazon Verified review Amazon
Gordon Curzon Jan 09, 2014
Full star icon Full star icon Full star icon Full star icon Full star icon 5
A must read for all SPSS users even if just for a refresh.Love the way the book is structured.
Amazon Verified review Amazon
Kamau Njenga Mar 09, 2014
Full star icon Full star icon Full star icon Full star icon Full star icon 5
This book is a must have for anyone learning to use SPSS modeler or looking to learn data mining with SPSS. The examples were great and easy to follow.
Amazon Verified review Amazon
Terry Taerum Dec 14, 2013
Full star icon Full star icon Full star icon Full star icon Full star icon 5
In the competitive world of analytical consulting, when getting the job done in a reasonable time is critical, it's helpful to know what others have done to solve similar challenges, particularly when using IBM SPSS Modeler. The text, combined with data samples and example streams delivers the shortest possible distance between problem and solution.
Amazon Verified review Amazon
Get free access to Packt library with over 7500+ books and video courses for 7 days!
Start Free Trial

FAQs

What is included in a Packt subscription? Chevron down icon Chevron up icon

A subscription provides you with full access to view all Packt and licnesed content online, this includes exclusive access to Early Access titles. Depending on the tier chosen you can also earn credits and discounts to use for owning content

How can I cancel my subscription? Chevron down icon Chevron up icon

To cancel your subscription with us simply go to the account page - found in the top right of the page or at https://subscription.packtpub.com/my-account/subscription - From here you will see the ‘cancel subscription’ button in the grey box with your subscription information in.

What are credits? Chevron down icon Chevron up icon

Credits can be earned from reading 40 section of any title within the payment cycle - a month starting from the day of subscription payment. You also earn a Credit every month if you subscribe to our annual or 18 month plans. Credits can be used to buy books DRM free, the same way that you would pay for a book. Your credits can be found in the subscription homepage - subscription.packtpub.com - clicking on ‘the my’ library dropdown and selecting ‘credits’.

What happens if an Early Access Course is cancelled? Chevron down icon Chevron up icon

Projects are rarely cancelled, but sometimes it's unavoidable. If an Early Access course is cancelled or excessively delayed, you can exchange your purchase for another course. For further details, please contact us here.

Where can I send feedback about an Early Access title? Chevron down icon Chevron up icon

If you have any feedback about the product you're reading, or Early Access in general, then please fill out a contact form here and we'll make sure the feedback gets to the right team. 

Can I download the code files for Early Access titles? Chevron down icon Chevron up icon

We try to ensure that all books in Early Access have code available to use, download, and fork on GitHub. This helps us be more agile in the development of the book, and helps keep the often changing code base of new versions and new technologies as up to date as possible. Unfortunately, however, there will be rare cases when it is not possible for us to have downloadable code samples available until publication.

When we publish the book, the code files will also be available to download from the Packt website.

How accurate is the publication date? Chevron down icon Chevron up icon

The publication date is as accurate as we can be at any point in the project. Unfortunately, delays can happen. Often those delays are out of our control, such as changes to the technology code base or delays in the tech release. We do our best to give you an accurate estimate of the publication date at any given time, and as more chapters are delivered, the more accurate the delivery date will become.

How will I know when new chapters are ready? Chevron down icon Chevron up icon

We'll let you know every time there has been an update to a course that you've bought in Early Access. You'll get an email to let you know there has been a new chapter, or a change to a previous chapter. The new chapters are automatically added to your account, so you can also check back there any time you're ready and download or read them online.

I am a Packt subscriber, do I get Early Access? Chevron down icon Chevron up icon

Yes, all Early Access content is fully available through your subscription. You will need to have a paid for or active trial subscription in order to access all titles.

How is Early Access delivered? Chevron down icon Chevron up icon

Early Access is currently only available as a PDF or through our online reader. As we make changes or add new chapters, the files in your Packt account will be updated so you can download them again or view them online immediately.

How do I buy Early Access content? Chevron down icon Chevron up icon

Early Access is a way of us getting our content to you quicker, but the method of buying the Early Access course is still the same. Just find the course you want to buy, go through the check-out steps, and you’ll get a confirmation email from us with information and a link to the relevant Early Access courses.

What is Early Access? Chevron down icon Chevron up icon

Keeping up to date with the latest technology is difficult; new versions, new frameworks, new techniques. This feature gives you a head-start to our content, as it's being created. With Early Access you'll receive each chapter as it's written, and get regular updates throughout the product's development, as well as the final course as soon as it's ready.We created Early Access as a means of giving you the information you need, as soon as it's available. As we go through the process of developing a course, 99% of it can be ready but we can't publish until that last 1% falls in to place. Early Access helps to unlock the potential of our content early, to help you start your learning when you need it most. You not only get access to every chapter as it's delivered, edited, and updated, but you'll also get the finalized, DRM-free product to download in any format you want when it's published. As a member of Packt, you'll also be eligible for our exclusive offers, including a free course every day, and discounts on new and popular titles.