Predicting house prices with regression
Let us start with a simple problem, predicting house prices in Boston.
We can use a publicly available dataset. We are given several demographic and geographical attributes, such as the crime rate or the pupil-teacher ratio, and the goal is to predict the median value of a house in a particular area. As usual, we have some training data, where the answer is known to us.
We start by using scikit-learn's methods to load the dataset. This is one of the built-in datasets that scikit-learn comes with, so it is very easy:
from sklearn.datasets import load_boston boston = load_boston()
The boston
object is a composite object with several attributes, in particular, boston.data
and boston.target
will be of interest to us.
We will start with a simple one-dimensional regression, trying to regress the price on a single attribute according to the average number of rooms per dwelling, which is stored at position 5
(you can consult boston.DESCR
and boston.feature_names...