Model Validation and Testing
With all the information now available online, it is easy for almost anybody to start working on a machine learning project. However, choosing the right algorithm for your data is a challenge when there are many options available. Due to this, the decision to use one algorithm over another is achieved through trial and error, where different alternatives are tested.
Moreover, the decision process to arrive at a good model covers not only the selection of the algorithm but also the tuning of its hyperparameters. To do this, a conventional approach is to divide the data into three parts (training, validation, and testing sets), which will be explained further in the next section.
Data Partitioning
Data partitioning is a process involving dividing a dataset into three subsets so that each set can be used for a different purpose. This way, the development of a model is not affected by the introduction of bias. The following is an explanation of each...