Linear regression with scikit-learn and higher dimensionality
scikit-learn offers the class LinearRegression
, which works with n-dimensional spaces. For this purpose, we're going to use the Boston dataset:
from sklearn.datasets import load_boston >>> boston = load_boston() >>> boston.data.shape (506L, 13L) >>> boston.target.shape (506L,)
It has 506 samples with 13 input features and one output. In the following figure, there' a collection of the plots of the first 12 features:
Note
When working with datasets, it's useful to have a tabular view to manipulate data. pandas is a perfect framework for this task, and even though it's beyond the scope of this book, I suggest you create a data frame with the command pandas.DataFrame(boston.data, columns=boston.feature_names)
and use Jupyter to visualize it. For further information, refer to Heydt M., Learning pandas - Python Data Discovery and Analysis Made Easy, Packt.
There are different scales and outliers (which can be...