Despite its name, logistic regression can actually be used as a model for classification. It uses a logistic function (or sigmoid) to convert any real-valued input x into a predicted output value ŷ that take values between 0 and 1, as shown in the following figure:
The logistic function
Rounding ŷ to the nearest integer effectively classifies the input as belonging either to class 0 or 1.
Of course, most often, our problems have more than one input or feature value, x. For example, the Iris dataset provides a total of four features. For the sake of simplicity, let's focus here on the first two features, sepal length—which we will call feature f1—and sepal width—which we will call f2. Using the tricks we learned when talking about linear regression, we know we can express the input x as a linear combination of the two features, f1 and f2:
However, in contrast to linear regression, we are not done yet. From the previous section, we know that the sum of products would result in a real-valued, output—but we are interested in a categorical value, zero or one. This is where the logistic function comes in: it acts as a squashing function, σ, that compresses the range of possible output values to the range [0, 1]:
[box type="shadow" align="" class="" width=""]Because the output is always between 0 and 1, it can be interpreted as a probability. If we only have a single input variable x, the output value ŷ can be interpreted as the probability of x belonging to class 1.[/box]
Now let's apply this knowledge to the Iris dataset!
The Iris dataset is included with scikit-learn. We first load all the necessary modules, as we did in our earlier examples:
In [1]: import numpy as np
... import cv2
... from sklearn import datasets
... from sklearn import model_selection
... from sklearn import metrics
... import matplotlib.pyplot as plt
... %matplotlib inline
In [2]: plt.style.use('ggplot')
Then, loading the dataset is a one-liner:
In [3]: iris = datasets.load_iris()
This function returns a dictionary we call iris, which contains a bunch of different fields:
In [4]: dir(iris)
Out[4]: ['DESCR', 'data', 'feature_names', 'target', 'target_names']
Here, all the data points are contained in 'data'. There are 150 data points, each of which has four feature values:
In [5]: iris.data.shape
Out[5]: (150, 4)
These four features correspond to the sepal and petal dimensions mentioned earlier:
In [6]: iris.feature_names Out[6]: ['sepal length (cm)',
'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']
For every data point, we have a class label stored in target:
In [7]: iris.target.shape
Out[7]: (150,)
We can also inspect the class labels, and find that there is a total of three classes:
In [8]: np.unique(iris.target)
Out[8]: array([0, 1, 2])
For the sake of simplicity, we want to focus on a binary classification problem for now, where we only have two classes. The easiest way to do this is to discard all data points belonging to a certain class, such as class label 2, by selecting all the rows that do not belong to class 2:
In [9]: idx = iris.target != 2
... data = iris.data[idx].astype(np.float32)
... target = iris.target[idx].astype(np.float32)
Before you get started with setting up a model, it is always a good idea to have a look at the data. We did this earlier for the town map example, so let's continue our streak. Using Matplotlib, we create a scatter plot where the color of each data point corresponds to the class label:
In [10]: plt.scatter(data[:, 0], data[:, 1], c=target, cmap=plt.cm.Paired, s=100)
... plt.xlabel(iris.feature_names[0])
... plt.ylabel(iris.feature_names[1]) Out[10]: <matplotlib.text.Text at 0x23bb5e03eb8>
To make plotting easier, we limit ourselves to the first two features (iris.feature_names[0] being the sepal length and iris.feature_names[1] being the sepal width). We can see a nice separation of classes in the following figure:
Plotting the first two features of the Iris dataset
We learned in the previous chapter that it is essential to keep training and test data separate. We can easily split the data using one of scikit-learn's many helper functions:
In [11]: X_train, X_test, y_train, y_test = model_selection.train_test_split(
... data, target, test_size=0.1, random_state=42
... )
Here we want to split the data into 90 percent training data and 10 percent test data, which we specify with test_size=0.1. By inspecting the return arguments, we note that we ended up with exactly 90 training data points and 10 test data points:
In [12]: X_train.shape, y_train.shape Out[12]: ((90, 4), (90,))
In [13]: X_test.shape, y_test.shape Out[13]: ((10, 4), (10,))
Creating a logistic regression classifier involves pretty much the same steps as setting up k- NN:
In [14]: lr = cv2.ml.LogisticRegression_create()
We then have to specify the desired training method. Here, we can choose cv2.ml.LogisticRegression_BATCH or cv2.ml.LogisticRegression_MINI_BATCH. For now, all we need to know is that we want to update the model after every data point, which can be achieved with the following code:
In [15]: lr.setTrainMethod(cv2.ml.LogisticRegression_MINI_BATCH)
... lr.setMiniBatchSize(1)
We also want to specify the number of iterations the algorithm should run before it terminates:
In [16]: lr.setIterations(100)
We can then call the training method of the object (in the exact same way as we did earlier), which will return True upon success:
In [17]: lr.train(X_train, cv2.ml.ROW_SAMPLE, y_train) Out[17]: True
As we just saw, the goal of the training phase is to find a set of weights that best transform the feature values into an output label. A single data point is given by its four feature values (f0, f1, f2, f3). Since we have four features, we should also get four weights, so that x = w0 f0 + w1 f1 + w2 f2 + w3 f3, and ŷ=σ(x). However, as discussed previously, the algorithm adds an extra weight that acts as an offset or bias, so that x = w0 f0 + w1 f1 + w2 f2 + w3 f3 + w4. We can retrieve these weights as follows:
In [18]: lr.get_learnt_thetas()
Out[18]: array([[-0.04109113, -0.01968078, -0.16216497, 0.28704911,
0.11945518]], dtype=float32)
This means that the input to the logistic function is x = -0.0411 f0 - 0.0197 f1 - 0.162 f2 + 0.287 f3 + 0.119. Then, when we feed in a new data point (f0, f1, f2, f3) that belongs to class 1, the output ŷ=σ(x) should be close to 1. But how well does that actually work?
Let's see for ourselves by calculating the accuracy score on the training set:
In [19]: ret, y_pred = lr.predict(X_train)
In [20]: metrics.accuracy_score(y_train, y_pred) Out[20]: 1.0
Perfect score! However, this only means that the model was able to perfectly memorize the training dataset. This does not mean that the model would be able to classify a new, unseen data point. For this, we need to check the test dataset:
In [21]: ret, y_pred = lr.predict(X_test)
... metrics.accuracy_score(y_test, y_pred) Out[21]: 1.0
Luckily, we get another perfect score! Now we can be sure that the model we built is truly awesome.
If you enjoyed building a classifier using logistic regression and would like to learn more machine learning tasks using OpenCV, be sure to check out the book, Machine Learning for OpenCV, where this section originally appears.