Numerical optimization
This section briefly introduces the different optimization algorithms that can be applied to minimize the loss function, with or without a penalty term. These algorithms are described in greater detail in the Summary of optimization technique section of the Appendix.
First, let's define the least squares problem. The minimization of the loss function consists of nullifying the first order derivatives, which in turn generates a system of D equations (also known as the gradient equations), D being the number of regression weights (parameters). The weights are iteratively computed by solving the system of equation using a numerical optimization algorithm.
M10: The definition of the least squares-based loss function for residual ri, weights w, a model f, input data xi and expected values yi, is as follows:
M10: Generation of gradient equations with Jacobian J matrix (refer to Basics of differential calculus section of the Appendix) after minimization loss function, L is described...