scikit-learn provides some built-in datasets that can be used for prototyping purposes because they don't require very long training processes and offer different levels of complexity. They're all available in the sklearn.datasets package and have a common structure: the data instance variable contains the whole input set X while the target contains the labels for classification or target values for regression. For example, considering the Boston house pricing dataset (used for regression), we have the following:
from sklearn.datasets import load_boston
boston = load_boston()
X = boston.data
Y = boston.target
print(X.shape)
(506, 13)
print(Y.shape)
(506,)
In this case, we have 506 samples with 13 features and a single target value. In this book, we're going to use it for regressions and the MNIST handwritten digit dataset (load_digits()) for classification...