Evaluating the performance of linear regression models
In the previous section, we discussed how to fit a regression model on training data. However, you learned in previous chapters that it is crucial to test the model on data that it hasn't seen during training to obtain an unbiased estimate of its performance.
As we remember from Chapter 6, Learning Best Practices for Model Evaluation and Hyperparameter Tuning, we want to split our dataset into separate training and test datasets where we use the former to fit the model and the latter to evaluate its performance to generalize to unseen data. Instead of proceeding with the simple regression model, we will now use all variables in the dataset and train a multiple regression model:
>>> from sklearn.cross_validation import train_test_split >>> X = df.iloc[:, :-1].values >>> y = df['MEDV'].values >>> X_train, X_test, y_train, y_test = train_test_split( ... X, y, test_size=0.3, random_state=0) >>...