It's time to split the data into train/test sets. Bear in mind that sometimes, people like to split their data three ways; train, test, and validation. For now, though, we'll keep things simple and just use train and test.Â
First, we will split the data into train_data and test_data. We are going to use train_data for training and test_data for prediction. We are going to have an 80-20 split:
In[19]: train_data = dataset.sample(frac=0.8, random_state=0)
In[20]: test_data = dataset.drop(train_dataset.index)
Now, we will separate the MPGÂ label from the train and test data:
In[21]: train_labels = train_data.pop('MPG')
In[22]: test_labels = test_data.pop('MPG')
In the next section, we will normalize the dataset as this helps us improve the performance of the model.