Multiple regression with gradient descent
When we ran multiple linear regression in Chapter 3, Correlation, we used the normal equation and matrices to quickly arrive at the coefficients for a multiple linear regression model. The normal equation is repeated as follows:
The normal equation uses matrix algebra to very quickly and efficiently arrive at the least squares estimates. Where all data fits in memory, this is a very convenient and concise equation. Where the data exceeds the memory available to a single machine however, the calculation becomes unwieldy. The reason for this is matrix inversion. The calculation of is not something that can be accomplished on a fold over the data—each cell in the output matrix depends on many others in the input matrix. These complex relationships require that the matrix be processed in a nonsequential way.
An alternative approach to solve linear regression problems, and many other related machine learning problems, is a technique called gradient descent...