Handling multiclass classification
One last thing worth noting is how logistic regression algorithms deal with multiclass classification. Although we interact with the scikit-learn classifiers in multiclass cases the same way as in binary cases, it is useful to understand how logistic regression works in multiclass classification.
Logistic regression for more than two classes is also called multinomial logistic regression, better known latterly as softmax regression. As you have seen in the binary case, the model is represented by one weight vector w, and the probability of the target being 1 or the positive class is written as follows:
In the K class case, the model is represented by K weight vectors, w1, w2, ..., wK, and the probability of the target being class k is written as follows:
See the following term:
The preceding term normalizes the following probabilities (k from 1 to K) so that they total 1:
The cost function in the binary case...