Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds
IBM SPSS Modeler Cookbook
IBM SPSS Modeler Cookbook

IBM SPSS Modeler Cookbook: If you've already had some experience with IBM SPSS Modeler this cookbook will help you delve deeper and exploit the incredible potential of this data mining workbench. The recipes come from some of the best brains in the business.

Arrow left icon
Profile Icon Keith McCormick Profile Icon Abbott
Arrow right icon
$70.99
Full star icon Full star icon Full star icon Full star icon Half star icon 4.4 (20 Ratings)
Paperback Oct 2013 382 pages 1st Edition
eBook
$37.99 $42.99
Paperback
$70.99
Subscription
Free Trial
Renews at $19.99p/m
Arrow left icon
Profile Icon Keith McCormick Profile Icon Abbott
Arrow right icon
$70.99
Full star icon Full star icon Full star icon Full star icon Half star icon 4.4 (20 Ratings)
Paperback Oct 2013 382 pages 1st Edition
eBook
$37.99 $42.99
Paperback
$70.99
Subscription
Free Trial
Renews at $19.99p/m
eBook
$37.99 $42.99
Paperback
$70.99
Subscription
Free Trial
Renews at $19.99p/m

What do you get with Print?

Product feature icon Instant access to your digital copy whilst your Print order is Shipped
Product feature icon Paperback book shipped to your preferred address
Product feature icon Redeem a companion digital copy on all Print orders
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
OR
Modal Close icon
Payment Processing...
tick Completed

Shipping Address

Billing Address

Shipping Methods
Table of content icon View table of contents Preview book icon Preview Book

IBM SPSS Modeler Cookbook

Chapter 2. Data Preparation – Select

In this chapter, we will cover:

  • Using the Feature Selection node creatively to remove or decapitate perfect predictors
  • Running a Statistics node on an anti-join to evaluate the potential missing data
  • Evaluating the use of sampling for speed
  • Removing redundant variables using correlation matrices
  • Selecting variables using the CHAID Modeling node
  • Selecting variables using the Means node
  • Selecting variables using single-antecedent Association Rules

Introduction

This chapter focuses on just the first task, Select, of the data preparation phase:

Decide on the data to be used for analysis. Criteria include relevance to the data mining goals, quality, and technical constraints such as limits on data volume or data types. Note that data selection covers selection of attributes (columns) as well as selection of records (rows) in a table.

Ideally, data mining empowers business people to discover valuable patterns in large quantities of data, to develop useful models and integrate them into the business quickly and easily. The name data mining suggests that large quantities of data will be involved, that the object is to extract rare and elusive bits of the data, and that data mining calls for working with data in bulk—no sampling.

New data miners are often struck by how much selection and sampling is actually done. For some, the stereotypical data miner dives in and looks at everything. It is unclear how such an unfocused search would...

Using the Feature Selection node creatively to remove or decapitate perfect predictors

In this recipe, we will identify perfect or near perfect predictors in order to insure that they do not contaminate our model. Perfect predictors earn their name by being correct 100 percent of the time, usually indicating circular logic and not a prediction of value. It is a common and serious problem.

When this occurs we have accidentally allowed information into the model that could not possibly be known at the time of the prediction. Everyone 30 days late on their mortgage receives a late letter, but receiving a late letter is not a good predictor of lateness because their lateness caused the letter, not the other way around.

The rather colorful term decapitate is borrowed from the data miner Dorian Pyle. It is a reference to the fact that perfect predictors will be found at the top of any list of key drivers ("caput" means head in Latin). Therefore, to decapitate is to remove the variable...

Running a Statistics node on anti-join to evaluate the potential missing data

There is typically some data loss when various data tables are integrated. Although we won't discuss data integration until a later chapter, it is important to gauge what (and how much) is lost at this stage. Financial variables are usually aggregated in very different ways for the financial planner and the data miner. It is critical that the data miner periodically translate the data of the data miner back into the form that middle and senior management will recognize so that they can better communicate.

The data miner deals with transactions and individual customer data, the language of individual rows of data. The manager speaks, generally, the language of spreadsheets: regions, product lines, months rolled up into aggregated cells in Excel.

On a project, we once discovered that a small percentage of missing rows represented a larger fraction of revenue than average—much larger actually. We suddenly...

Evaluating the use of sampling for speed

Modern data mining practice is somewhat different from the ideal. Data miners certainly do develop valuable models that are used in the business and many have massive resources of data to mine, even more data than might have been foreseen a generation ago. But not all data miners meet the profile of a business user, someone whose primary work responsibility is not data analysis and who is not trained in, or concerned with, statistical methods. Nor does the modern data miner shy away from sampling.

In practice, it has been difficult to make discoveries and build models quickly when working with massive quantities of data. Although data mining tools may be designed to streamline the process, it still takes longer for each operation to complete on a large amount of data than it would with a smaller quantity. This sampling can be extremely useful.

Getting ready

We will start with a blank stream, and will be using the cup98lrn reduced vars2.txt data set...

Removing redundant variables using correlation matrices

In this recipe we will remove redundant variables by building a correlation matrix that identifies highly correlated variables.

Getting ready

This recipe uses the datafile, nasadata.txt and the stream file, recipe_variableselection_correlations.str.

You will need a copy of Microsoft Excel to visualize the correlation matrix.

How to do it...

To remove redundant variables using correlation matrices:

  1. Open the stream, recipe_variableselection_correlations.str by navigating to File | Open Stream.
  2. Make sure the datafile points to the correct path to the file nasadata.txt.
  3. Open the Type node named Correlation Types. Notice that there are several variables of type continuous whose direction values have been set to Input, and a single continuous variable has its direction set to Target. The variable set to Target can be any variable that won't be an input to the model. If you don't have a good candidate, you can create a random variable and...

Selecting variables using the CHAID Modeling node

In this recipe we will identify and select variables to include as model inputs using the CHAID node.

You will need a copy of Microsoft Excel to visualize and select the chi-square values for each variable.

Getting ready

This recipe uses the datafile cup98lrn_reduced_vars3.sav and the stream recipe_variableselection_chaid.str.

How to do it...

To identify and select variables to include as model inputs using the CHAID node:

  1. Open the stream variableselection_chaid.str by navigating to File | Open Stream and selecting the stream.
  2. Make sure the datafile points to the correct path for the file cup98lrn_reduced_vars3.sav.
  3. Open the Type node named CHAID Types. Notice that there are several variables of type continuous whose direction values have been set to Input, and a single continuous variable has its direction set to Target. The variable set to Target should be the target variable TARGET_B.
  4. Open the node TARGET_B and select the Interactive Model option...

Selecting variables using the Means node

In this recipe we will identify and select variables to include as model inputs using the Means node.

Getting ready

This recipe uses the datafile cup98lrn_reduced_vars3.sav and the stream recipe_variableselection_means.str.

You will need a copy of Microsoft Excel to visualize the list of rules (optional).

How to do it...

To identify and select variables to include as model inputs using the Means node:

  1. Open the stream variableselection_means.str by navigating File | Open Stream.
  2. Make sure the datafile points to the correct path to the file cup98lrn_reduced_vars3.sav.
  3. Open the Means node to look at the options. Note that the grouping variable is our target variable TARGET_B, and the test fields are all the continuous variables of interest as shown in the following figure.
    How to do it...
  4. Run the Means node by clicking on Run.
  5. Inside the output window, click on the Importance column twice so that the variables are sorted in descending order of Importance as shown in the following...

Selecting variables using single-antecedent Association Rules

In this recipe we will identify and select variables to include as model inputs using the Apriori Association Rules node. We will select the top 24 predictors based on Association Rules variable selection. We will use the same KDD Cup 1998 data set, but this version of the data was prepared with the stream Recipe - variable selection apriori data prep.str to create quintile versions of continuous variables. The target variable is the top quintile in donation amounts, TARGET_D between $20 and $200.

Getting ready

This recipe uses the datafile cup98lrn_reduced_vars3_apriori.sav and the stream Recipe - variable selection apriori.str.

You will need a copy of Microsoft Excel to visualize the list of rules.

How to do it...

To identify and select variables to include as model inputs using the Apriori Association Rules node:

  1. Open the stream Recipe - variable selection apriori.str by navigating to File | Open Stream.
  2. Make sure the datafile points...
Left arrow icon Right arrow icon

Key benefits

  • Go beyond mere insight and build models than you can deploy in the day to day running of your business
  • Save time and effort while getting more value from your data than ever before
  • Loaded with detailed step-by-step examples that show you exactly how it's done by the best in the business

Description

IBM SPSS Modeler is a data mining workbench that enables you to explore data, identify important relationships that you can leverage, and build predictive models quickly allowing your organization to base its decisions on hard data not hunches or guesswork. IBM SPSS Modeler Cookbook takes you beyond the basics and shares the tips, the timesavers, and the workarounds that experts use to increase productivity and extract maximum value from data. The authors of this book are among the very best of these exponents, gurus who, in their brilliant and imaginative use of the tool, have pushed back the boundaries of applied analytics. By reading this book, you are learning from practitioners who have helped define the state of the art. Follow the industry standard data mining process, gaining new skills at each stage, from loading data to integrating results into everyday business practices. Get a handle on the most efficient ways of extracting data from your own sources, preparing it for exploration and modeling. Master the best methods for building models that will perform well in the workplace. Go beyond the basics and get the full power of your data mining workbench with this practical guide.

Who is this book for?

If you have had some hands-on experience with IBM SPSS Modeler and now want to go deeper and take more control over your data mining process, this is the guide for you. It is ideal for practitioners who want to break into advanced analytics.

What you will learn

  • Use and understand the industry standard CRISP_DM process for data mining.
  • Assemble data simply, quickly, and correctly using the full power of extraction, transformation, and loading (ETL) tools.
  • Control the amount of time you spend organizing and formatting your data.
  • Develop predictive models that stand up to the demands of real-life applications.
  • Take your modeling to the next level beyond default settings and learn the tips that the experts use.
  • Learn why the best model is not always the most accurate one.
  • Master deployment techniques that put your discoveries to work making the most of your business most critical resources.
  • Challenge yourself with scripting for ultimate control and automation - it s easier than you think!
Estimated delivery fee Deliver to United States

Economy delivery 10 - 13 business days

Free $6.95

Premium delivery 6 - 9 business days

$21.95
(Includes tracking information)

Product Details

Country selected
Publication date, Length, Edition, Language, ISBN-13
Publication date : Oct 24, 2013
Length: 382 pages
Edition : 1st
Language : English
ISBN-13 : 9781849685467
Vendor :
IBM
Category :
Languages :
Concepts :
Tools :

What do you get with Print?

Product feature icon Instant access to your digital copy whilst your Print order is Shipped
Product feature icon Paperback book shipped to your preferred address
Product feature icon Redeem a companion digital copy on all Print orders
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
OR
Modal Close icon
Payment Processing...
tick Completed

Shipping Address

Billing Address

Shipping Methods
Estimated delivery fee Deliver to United States

Economy delivery 10 - 13 business days

Free $6.95

Premium delivery 6 - 9 business days

$21.95
(Includes tracking information)

Product Details

Publication date : Oct 24, 2013
Length: 382 pages
Edition : 1st
Language : English
ISBN-13 : 9781849685467
Vendor :
IBM
Category :
Languages :
Concepts :
Tools :

Packt Subscriptions

See our plans and pricing
Modal Close icon
$19.99 billed monthly
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Simple pricing, no contract
$199.99 billed annually
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just $5 each
Feature tick icon Exclusive print discounts
$279.99 billed in 18 months
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just $5 each
Feature tick icon Exclusive print discounts

Frequently bought together


Stars icon
Total $ 109.98
IBM SPSS Modeler Essentials
$38.99
IBM SPSS Modeler Cookbook
$70.99
Total $ 109.98 Stars icon

Table of Contents

10 Chapters
1. Data Understanding Chevron down icon Chevron up icon
2. Data Preparation – Select Chevron down icon Chevron up icon
3. Data Preparation – Clean Chevron down icon Chevron up icon
4. Data Preparation – Construct Chevron down icon Chevron up icon
5. Data Preparation – Integrate and Format Chevron down icon Chevron up icon
6. Selecting and Building a Model Chevron down icon Chevron up icon
7. Modeling – Assessment, Evaluation, Deployment, and Monitoring Chevron down icon Chevron up icon
8. CLEM Scripting Chevron down icon Chevron up icon
A. Business Understanding Chevron down icon Chevron up icon
Index Chevron down icon Chevron up icon

Customer reviews

Top Reviews
Rating distribution
Full star icon Full star icon Full star icon Full star icon Half star icon 4.4
(20 Ratings)
5 star 65%
4 star 20%
3 star 5%
2 star 5%
1 star 5%
Filter icon Filter
Top Reviews

Filter reviews by




Amazon Customer Dec 09, 2014
Full star icon Full star icon Full star icon Full star icon Full star icon 5
Short of paying a fortune to IBM for training. This all you need to get started esp if you have any kind of background with SAS or SPSS
Amazon Verified review Amazon
Steve F. Feb 01, 2015
Full star icon Full star icon Full star icon Full star icon Full star icon 5
Excellent book. SPSS Modeler comes with a ton of really good demos and this book covers everything the demos don't.
Amazon Verified review Amazon
Gordon Curzon Jan 09, 2014
Full star icon Full star icon Full star icon Full star icon Full star icon 5
A must read for all SPSS users even if just for a refresh.Love the way the book is structured.
Amazon Verified review Amazon
Kamau Njenga Mar 09, 2014
Full star icon Full star icon Full star icon Full star icon Full star icon 5
This book is a must have for anyone learning to use SPSS modeler or looking to learn data mining with SPSS. The examples were great and easy to follow.
Amazon Verified review Amazon
Terry Taerum Dec 14, 2013
Full star icon Full star icon Full star icon Full star icon Full star icon 5
In the competitive world of analytical consulting, when getting the job done in a reasonable time is critical, it's helpful to know what others have done to solve similar challenges, particularly when using IBM SPSS Modeler. The text, combined with data samples and example streams delivers the shortest possible distance between problem and solution.
Amazon Verified review Amazon
Get free access to Packt library with over 7500+ books and video courses for 7 days!
Start Free Trial

FAQs

What is the digital copy I get with my Print order? Chevron down icon Chevron up icon

When you buy any Print edition of our Books, you can redeem (for free) the eBook edition of the Print Book you’ve purchased. This gives you instant access to your book when you make an order via PDF, EPUB or our online Reader experience.

What is the delivery time and cost of print book? Chevron down icon Chevron up icon

Shipping Details

USA:

'

Economy: Delivery to most addresses in the US within 10-15 business days

Premium: Trackable Delivery to most addresses in the US within 3-8 business days

UK:

Economy: Delivery to most addresses in the U.K. within 7-9 business days.
Shipments are not trackable

Premium: Trackable delivery to most addresses in the U.K. within 3-4 business days!
Add one extra business day for deliveries to Northern Ireland and Scottish Highlands and islands

EU:

Premium: Trackable delivery to most EU destinations within 4-9 business days.

Australia:

Economy: Can deliver to P. O. Boxes and private residences.
Trackable service with delivery to addresses in Australia only.
Delivery time ranges from 7-9 business days for VIC and 8-10 business days for Interstate metro
Delivery time is up to 15 business days for remote areas of WA, NT & QLD.

Premium: Delivery to addresses in Australia only
Trackable delivery to most P. O. Boxes and private residences in Australia within 4-5 days based on the distance to a destination following dispatch.

India:

Premium: Delivery to most Indian addresses within 5-6 business days

Rest of the World:

Premium: Countries in the American continent: Trackable delivery to most countries within 4-7 business days

Asia:

Premium: Delivery to most Asian addresses within 5-9 business days

Disclaimer:
All orders received before 5 PM U.K time would start printing from the next business day. So the estimated delivery times start from the next day as well. Orders received after 5 PM U.K time (in our internal systems) on a business day or anytime on the weekend will begin printing the second to next business day. For example, an order placed at 11 AM today will begin printing tomorrow, whereas an order placed at 9 PM tonight will begin printing the day after tomorrow.


Unfortunately, due to several restrictions, we are unable to ship to the following countries:

  1. Afghanistan
  2. American Samoa
  3. Belarus
  4. Brunei Darussalam
  5. Central African Republic
  6. The Democratic Republic of Congo
  7. Eritrea
  8. Guinea-bissau
  9. Iran
  10. Lebanon
  11. Libiya Arab Jamahriya
  12. Somalia
  13. Sudan
  14. Russian Federation
  15. Syrian Arab Republic
  16. Ukraine
  17. Venezuela
What is custom duty/charge? Chevron down icon Chevron up icon

Customs duty are charges levied on goods when they cross international borders. It is a tax that is imposed on imported goods. These duties are charged by special authorities and bodies created by local governments and are meant to protect local industries, economies, and businesses.

Do I have to pay customs charges for the print book order? Chevron down icon Chevron up icon

The orders shipped to the countries that are listed under EU27 will not bear custom charges. They are paid by Packt as part of the order.

List of EU27 countries: www.gov.uk/eu-eea:

A custom duty or localized taxes may be applicable on the shipment and would be charged by the recipient country outside of the EU27 which should be paid by the customer and these duties are not included in the shipping charges been charged on the order.

How do I know my custom duty charges? Chevron down icon Chevron up icon

The amount of duty payable varies greatly depending on the imported goods, the country of origin and several other factors like the total invoice amount or dimensions like weight, and other such criteria applicable in your country.

For example:

  • If you live in Mexico, and the declared value of your ordered items is over $ 50, for you to receive a package, you will have to pay additional import tax of 19% which will be $ 9.50 to the courier service.
  • Whereas if you live in Turkey, and the declared value of your ordered items is over € 22, for you to receive a package, you will have to pay additional import tax of 18% which will be € 3.96 to the courier service.
How can I cancel my order? Chevron down icon Chevron up icon

Cancellation Policy for Published Printed Books:

You can cancel any order within 1 hour of placing the order. Simply contact customercare@packt.com with your order details or payment transaction id. If your order has already started the shipment process, we will do our best to stop it. However, if it is already on the way to you then when you receive it, you can contact us at customercare@packt.com using the returns and refund process.

Please understand that Packt Publishing cannot provide refunds or cancel any order except for the cases described in our Return Policy (i.e. Packt Publishing agrees to replace your printed book because it arrives damaged or material defect in book), Packt Publishing will not accept returns.

What is your returns and refunds policy? Chevron down icon Chevron up icon

Return Policy:

We want you to be happy with your purchase from Packtpub.com. We will not hassle you with returning print books to us. If the print book you receive from us is incorrect, damaged, doesn't work or is unacceptably late, please contact Customer Relations Team on customercare@packt.com with the order number and issue details as explained below:

  1. If you ordered (eBook, Video or Print Book) incorrectly or accidentally, please contact Customer Relations Team on customercare@packt.com within one hour of placing the order and we will replace/refund you the item cost.
  2. Sadly, if your eBook or Video file is faulty or a fault occurs during the eBook or Video being made available to you, i.e. during download then you should contact Customer Relations Team within 14 days of purchase on customercare@packt.com who will be able to resolve this issue for you.
  3. You will have a choice of replacement or refund of the problem items.(damaged, defective or incorrect)
  4. Once Customer Care Team confirms that you will be refunded, you should receive the refund within 10 to 12 working days.
  5. If you are only requesting a refund of one book from a multiple order, then we will refund you the appropriate single item.
  6. Where the items were shipped under a free shipping offer, there will be no shipping costs to refund.

On the off chance your printed book arrives damaged, with book material defect, contact our Customer Relation Team on customercare@packt.com within 14 days of receipt of the book with appropriate evidence of damage and we will work with you to secure a replacement copy, if necessary. Please note that each printed book you order from us is individually made by Packt's professional book-printing partner which is on a print-on-demand basis.

What tax is charged? Chevron down icon Chevron up icon

Currently, no tax is charged on the purchase of any print book (subject to change based on the laws and regulations). A localized VAT fee is charged only to our European and UK customers on eBooks, Video and subscriptions that they buy. GST is charged to Indian customers for eBooks and video purchases.

What payment methods can I use? Chevron down icon Chevron up icon

You can pay with the following card types:

  1. Visa Debit
  2. Visa Credit
  3. MasterCard
  4. PayPal
What is the delivery time and cost of print books? Chevron down icon Chevron up icon

Shipping Details

USA:

'

Economy: Delivery to most addresses in the US within 10-15 business days

Premium: Trackable Delivery to most addresses in the US within 3-8 business days

UK:

Economy: Delivery to most addresses in the U.K. within 7-9 business days.
Shipments are not trackable

Premium: Trackable delivery to most addresses in the U.K. within 3-4 business days!
Add one extra business day for deliveries to Northern Ireland and Scottish Highlands and islands

EU:

Premium: Trackable delivery to most EU destinations within 4-9 business days.

Australia:

Economy: Can deliver to P. O. Boxes and private residences.
Trackable service with delivery to addresses in Australia only.
Delivery time ranges from 7-9 business days for VIC and 8-10 business days for Interstate metro
Delivery time is up to 15 business days for remote areas of WA, NT & QLD.

Premium: Delivery to addresses in Australia only
Trackable delivery to most P. O. Boxes and private residences in Australia within 4-5 days based on the distance to a destination following dispatch.

India:

Premium: Delivery to most Indian addresses within 5-6 business days

Rest of the World:

Premium: Countries in the American continent: Trackable delivery to most countries within 4-7 business days

Asia:

Premium: Delivery to most Asian addresses within 5-9 business days

Disclaimer:
All orders received before 5 PM U.K time would start printing from the next business day. So the estimated delivery times start from the next day as well. Orders received after 5 PM U.K time (in our internal systems) on a business day or anytime on the weekend will begin printing the second to next business day. For example, an order placed at 11 AM today will begin printing tomorrow, whereas an order placed at 9 PM tonight will begin printing the day after tomorrow.


Unfortunately, due to several restrictions, we are unable to ship to the following countries:

  1. Afghanistan
  2. American Samoa
  3. Belarus
  4. Brunei Darussalam
  5. Central African Republic
  6. The Democratic Republic of Congo
  7. Eritrea
  8. Guinea-bissau
  9. Iran
  10. Lebanon
  11. Libiya Arab Jamahriya
  12. Somalia
  13. Sudan
  14. Russian Federation
  15. Syrian Arab Republic
  16. Ukraine
  17. Venezuela