Random sampling – splitting a dataset in training and testing datasets
Splitting the dataset in training and testing the datasets is one operation every predictive modeller has to perform before applying the model, irrespective of the kind of data in hand or the predictive model being applied. Generally, a dataset is split into training and testing datasets. The following is a description of the two types of datasets:
The training dataset is the one on which the model is built. This is the one on which the calculations are performed and the model equations and parameters are created.
The testing dataset is used to check the accuracy of the model. The model equations and parameters are used to calculate the output based on the inputs from the testing datasets. These outputs are used to compare the model efficiency in the light of the actuals present in the testing dataset.
This will become clearer from the following image:
Generally, the...