Linear regression and gradient descent
We can use gradient descent, rather than ordinary least squares, to estimate our linear regression parameters. Gradient descent iterates over possible coefficient values to find those that minimize the residual sum of squares. It starts with random coefficient values and calculates the sum of squared errors for that iteration. Then, it generates new values for coefficients that yield smaller residuals than those from the previous step. We specify a learning rate when using gradient descent. The learning rate determines the amount of improvement in residuals at each step.
Gradient descent can often be a good choice when working with very large datasets. It may be the only choice if the full dataset does not fit into your machine’s memory. We will use both OLS and gradient descent to estimate our parameters in the next section.