Getting started with scikit-learn
In this recipe, we introduce the basics of the machine learning scikit-learn package (http://scikit-learn.org). This package is the main tool we will use throughout this chapter. Its clean API makes it really easy to define, train, and test models. Plus, scikit-learn is specifically designed for speed and (relatively) big data.
We will show here a very basic example of linear regression in the context of curve fitting. This toy example will allow us to illustrate key concepts such as linear models, overfitting, underfitting, regularization, and cross-validation.
Getting ready
You can find all instructions to install scikit-learn in the main documentation. For more information, refer to http://scikit-learn.org/stable/install.html. With anaconda, you can type conda install scikit-learn
in a terminal.
How to do it...
We will generate a one-dimensional dataset with a simple model (including some noise), and we will try to fit a function to this data. With this function...