Chapter 7: Model Improvements
Activity 12: Perform Repeated K-Fold Cross Validation and Grid Search Optimization
Load the required packages mlbench, caret, and dplyr for the exercise:
library(mlbench) library(dplyr) library(caret)
Load the PimaIndianDiabetes dataset into memory from mlbench package:
data(PimaIndiansDiabetes) df<-PimaIndiansDiabetes
Set a seed value as 2019 for reproducibility:
set.seed(2019)
Define the K-Fold validation object using the trainControl function from the caret package and define method as repeatedcv instead of cv. Define an additional construct in the trainControl function for the number of repeats in the validation repeats = 10:
train_control = trainControl(method = "repeatedcv", number=5, repeats = 10, savePredictions = TRUE,verboseIter = TRUE)
Define the grid for hyperparameter mtry of random forest model as (3,4,5):
parameter_values = expand.grid(mtry=c(3,4,5))
Fit the model with the grid values, cross-validation object, and random forest classifier:
model_rf_kfold<- train(diabetes~., data=df, trControl=train_control, method="rf", metric= "Accuracy", tuneGrid = parameter_values)
Study the model performance by printing the average accuracy and standard deviation of accuracy:
print(paste("Average Accuracy :",mean(model_rf_kfold$resample$Accuracy))) print(paste("Std. Dev Accuracy :",sd(model_rf_kfold$resample$Accuracy)))
Study the model performance by plotting the accuracy across different values of the hyperparameter:
plot(model_rf_kfold)
The final output is as follows:
In this plot, we can see that we perform repeated k-fold cross-validation and grid search optimization on the same model.