Chapter 12: Training Machine Learning Models with scikit-learn
As we mentioned in the introduction of the previous chapter, Python has gained a lot of popularity in the data science field. We've seen that libraries such as NumPy and pandas have emerged to handle big datasets efficiently in Python. Those libraries are the foundation for libraries dedicated to machine learning (ML), such as the famous scikit-learn library, a complete toolset for implementing most of the algorithms and techniques that are used daily by data scientists. In this chapter, we'll provide a quick introduction to ML, what it is about, what it tries to solve, and how. Then, we'll learn how to use scikit-learn to train and test ML models. We'll also have a deeper look at two classical ML models, Naive Bayes models and support vector machines, both of which can perform surprisingly well if used correctly.
In this chapter, we're going to cover the following main topics:
- What is...