Selecting cases
Often during a data mining project, you will need to select a subset of records. For example, you might want to build a model that only includes people that have certain characteristics (for example, customers who have purchased something within the last six months). The Select
node is used when you want to select or discard a subset of records based on a specific condition. Let's go through an example of how to use the Select
node:
- Open the
Cleaning and Selecting data
stream. - Click on the
Record Ops
palette. - Double-click on the
Select
node so that it is connected to theVar.File
node. - Right-click on the
Select
node, then clickEdit
:
The Select
node allows users to specify an expression either directly in the Condition
text box or with the Expression Builder
(the calculator icon). The Mode
option allows you to choose whether to select (Include
) or delete (Discard
) records that satisfy the condition.
Note
You can type the selection criteria into the Condition
textbox, but it is often...