Understanding classification datasets – loading, managing, and visualizing the Iris dataset
In the previous recipe, we studied one of the most common problem types in SL: regression. In this recipe, we will take a closer look at another of these problem types: classification.
In classification problems, we want to estimate a categorial output, a class, from a set of given classes, using a variable number of input features. In this recipe, we will analyze a toy classification dataset from Kaggle: the Iris dataset, one of the most renowned classification datasets.
The Iris dataset presents the problem of estimating the iris
class of the flower of plants, from three classes (iris setosa, iris versicolor, and iris virginica) with the help of the following four features:
- Sepal length (in cm)
- Sepal width (in cm)
- Petal length (in cm)
- Petal width (in cm)
These data features are provided for 150 flowers, with 50 instances for each of the 3 classes (making...