Measuring prediction performance
We have already seen that the machine learning process consists of the following steps:
Model selection: We first select a suitable model for our data. Do we have labels? How many samples are available? Is the data separable? How many dimensions do we have? As this step is nontrivial, the choice will depend on the actual problem. As of Fall 2015, the scikit-learn documentation contains a much appreciated flowchart called choosing the right estimator. It is short, but very informative and worth taking a closer look at.
Training: We have to bring the model and data together, and this usually happens in the fit methods of the models in scikit-learn.
Application: Once we have trained our model, we are able to make predictions about the unseen data.
So far, we omitted an important step that takes place between the training and application: the model testing and validation. In this step, we want to evaluate how well our model has learned.
One goal of learning, and...