Splitting a dataset into a training and test set
In this recipe, you will split the data into training and test sets using the SSIS percentage sampling transformation. You will use 70 percent of the data for the training set and 3 percent for the test set.
Getting ready
There are no special prerequisites for this recipe, except, of course, SSIS 2016 installed, and the AdventureWorksDW2014
database available in your SQL Server instance.
How to do it...
Open SQL Server Data Tools (SSDT) and create a new project using the integration services project template. Place the solution in the C:\SSIS2016Cookbook
folder and name the project Chapter08
:
- Rename the default package to
SplitData.dtsx
. - In the
Control Flow
tab in thePackage Designer
, add a new data flow task by dragging and dropping it from the SSIS toolbox. - Right-click the task and select
Rename
from the pop-up menu. Change the task's name toSplitData
. - Click the
Data Flow
tab. - Create a new OLE DB source. Name it
AW_DW_Source
. - Double-click the
AW_DW_Source...