Modeling class probabilities via logistic regression
Although the perceptron rule offers a nice and easy-going introduction to machine learning algorithms for classification, its biggest disadvantage is that it never converges if the classes are not perfectly linearly separable. The classification task in the previous section would be an example of such a scenario. The reason for this is that the weights are continuously being updated since there is always at least one misclassified training example present in each epoch. Of course, you can change the learning rate and increase the number of epochs, but be warned that the perceptron will never converge on this dataset.
To make better use of our time, we will now take a look at another simple, yet more powerful, algorithm for linear and binary classification problems: logistic regression. Note that, despite its name, logistic regression is a model for classification, not regression.