Gradient descent
As we just hinted at the end of the last section, we aren’t always in a position where we can use the closed-form OLS solution of Eq. 20. What are our options? To construct a more general approach to empirical risk minimization, we’ll have to revisit the shape of the empirical risk function so that we can understand how to locate its minima.
Locating the minimum of a simple risk function
To understand the shape of the empirical risk function, let’s take a simple example with a model that has a single parameter. We’ll use the risk function for a linear model and a squared-loss function. We’ll use a linear model with a single feature, and so it is of the following form:
Eq. 23
The model has a single parameter, β, which multiplies the single feature x. In Figure 4.7 we have plotted the shape of the empirical risk function against the value of β, and where we have calculated the empirical risk on a dataset...