Stochastic gradient descent algorithms
After discussing the basics of logistic regression, it's useful to introduce the SGDClassifier
class, which implements a very famous algorithm that can be applied to several different loss functions. The idea behind stochastic gradient descent is iterating a weight update based on the gradient of loss function:
However, instead of considering the whole dataset, the update procedure is applied on batches randomly extracted from it. In the preceding formula, L is the loss function we want to minimize (as discussed in Chapter 2, Important Elements in Machine Learning) and gamma (eta0
in scikit-learn) is the learning rate, a parameter that can be constant or decayed while the learning process proceeds. The learning_rate
parameter can be also left with its default value (optimal
), which is computed internally according to the regularization factor.
The process should end when the weights stop modifying or their variation keeps itself under a selected threshold...