Summary
In this chapter, we used several of scikit-learn's
methods for building a standard workflow to run and evaluate data mining models. We introduced the Nearest Neighbors algorithm, which is already implemented in scikit-learn
as an estimator. Using this class is quite easy; first, we call the fit
function on our training data, and second, we use the predict
function to predict the class of testing samples.
We then looked at preprocessing by fixing poor feature scaling. This was done using a Transformer
object and the MinMaxScaler
class. These functions also have a fit
method and then a transform, which takes a dataset as an input and returns a transformed dataset as an output.
In the next chapter, we will use these concepts in a larger example, predicting the outcome of sports matches using real-world data.