After discussing the basics of logistic regression, it's useful to introduce the SGDClassifier class, which implements a very common algorithm that can be applied to several different loss functions. The idea behind SGD is to minimize a cost function by iterating a weight update based on the gradient:
However, instead of considering the whole dataset, the update procedure is applied on batches randomly extracted from it (for this reason, it is often also called mini-batch gradient descent). In the preceding formula, L is the cost function we want to minimize with respect to the parameters (as discussed in Chapter 2, Important Elements in Machine Learning) and γ (eta0 in scikit-learn) is the learning rate, a parameter that can be constant or decayed while the learning process proceeds. The learning_rate hyperparameter can also...