Let's see how to split our data properly into training and testing datasets. As we said in Chapter 1, The Realm of Supervised Learning, in the Building a linear regressor recipe, when we build a machine learning model, we need a way to validate our model to check whether it is performing at a satisfactory level. To do this, we need to separate our data into two groups—a training dataset and a testing dataset. The training dataset will be used to build the model, and the testing dataset will be used to see how this trained model performs on unknown data.
In this recipe, we will learn how to split the dataset for training and testing phases.