Since the training dataset is small, we will perform five-fold cross-validation, to get a better sense of the model's ability to generalize to new data. We will also use all five of the models built in the different folds of cross-validation in training, for inference. The probability of a test data point belonging to a class label would be the average probability prediction of all five models, which is represented as follows:
Since the aim is to predict the actual classes and not the probability, we would select the class that has the maximum probability. This methodology works when we are working with a classification-based network and cost function. If we are treating the problem as a regression problem, then there are a few alterations to the process, which we will discuss later on.