Let's start with a simple problem, namely, predicting house prices in Boston. The problem is as follows: we are given several demographic and geographical attributes, such as the crime rate or the pupil-teacher ratio in the neighborhood. The goal is to predict the median value of a house in a particular area. As in the case of classification, we have some training data and want to build a model that can be generalized to other data.
This is one of the built-in datasets that scikit-learn comes with, so it is very easy to load the data into memory:
from sklearn.datasets import load_boston
boston = load_boston()
The boston object contains several attributes; in particular, boston.data contains the input data and boston.target contains the price of houses in thousands of dollars.
We will start with a simple one-dimensional regression,...