In this chapter, we'll learn how to make predictions with scikit-learn. Machine learning emphasizes on measuring the ability to predict, and with scikit-learn we will predict accurately and quickly.
We will examine the iris dataset, which consists of measurements of three types of Iris flowers: Iris Setosa, Iris Versicolor, and Iris Virginica.
To measure the strength of the predictions, we will:
- Save some data for testing
- Build a model using only training data
- Measure the predictive power on the test set
The prediction—one of three flower types is categorical. This type of problem is called a classification problem.
Informally, classification asks, Is it an apple or an orange? Contrast this with machine learning regression, which asks, How many apples? By the way, the answer can be 4.5 apples for regression.
By the evolution of its design, scikit-learn addresses machine learning mainly via four categories:
- Classification:
- Non-text classification, like the Iris flowers example
- Text classification
- Regression
- Clustering
- Dimensionality reduction