Introduction
Advanced analytics, including statistics, data mining, and machine learning, has become very popular in recent years. You can use SSIS to prepare the data you need for further analysis. Often, you need to prepare a sample of your data. The sample has to be random. For predictive algorithms, you typically split the data into a training set, used to train multiple models, and a test set, used to perform predictions on it, and see which model gives you the best results. You can use the row sampling and the percentage sampling transformations to create random samples.
In the SQL Server suite, you can use SQL Server Analysis Services (SSAS), installed in multidimensional and data mining mode, to create data mining models. In addition, from SQL Server 2016, you can also use the R language to do nearly any kind of advanced analysis you want. You will learn in this chapter how you can use both SSAS and R models in the SSIS data flow.
You will use the data mining query transformation for...