CrossValidation and hyperparameter tuning
We will be looking at one example each of CrossValidation
and hyperparameter tuning. Let's take a look at CrossValidation
.
CrossValidation
As stated before, we've used the default parameters of the machine learning algorithm and we don't know if they are a good choice. In addition, instead of simply splitting your data into training and testing, or training, testing, and validation sets, CrossValidation
might be a better choice because it makes sure that eventually all the data is seen by the machine learning algorithm.
Note
CrossValidation
basically splits your complete available training data into a number of k folds. This parameter k can be specified. Then, the whole Pipeline
is run once for every fold and one machine learning model is trained for each fold. Finally, the different machine learning models obtained are joined. This is done by a voting scheme for classifiers or by averaging for regression.
The following figure illustrates ten-fold CrossValidation...