Training models with scikit-learn
scikit-learn is one of the most widely used Python libraries for data science. It implements dozens of classic ML models, but also numerous tools to help you while training them, such as preprocessing methods and cross-validation. Nowadays, you’ll probably hear about more modern approaches, such as PyTorch, but scikit-learn is still a solid tool for a lot of use cases.
The first thing you must do to get started is to install it in your Python environment:
(venv) $ pip install scikit-learn
We can now start our scikit-learn journey!
Training models and predicting
In scikit-learn, ML models and algorithms are called estimators. Each is a Python class that implements the same methods. In particular, we have fit
, which is used to train a model, and predict
, which is used to run the trained model on new data.
To try this, we’ll load a sample dataset. scikit-learn comes with a few toy datasets that are very useful for performing...