Let's consider a small dataset built by adding some uniform noise to the points belonging to a segment bounded between -6 and 6. The original equation is: y = x + 2 + n, where n is a noise term.
In the following figure, there's a plot with a candidate regression function:
![](https://static.packt-cdn.com/products/9781785889622/graphics/assets/9ae5a89b-e4e7-4158-906a-cd3b8567c58c.png)
As we're working on a plane, the regressor we're looking for is a function of only two parameters:
![](https://static.packt-cdn.com/products/9781785889622/graphics/assets/84566164-c518-4bb1-974f-f9ec18c878f2.png)
In order to fit our model, we must find the best parameters and to do that we choose an ordinary least squares approach. The loss function to minimize is:
![](https://static.packt-cdn.com/products/9781785889622/graphics/assets/8361aa4d-cf56-4843-8422-71887faf3785.png)
With an analytic approach, in order to find the global minimum, we must impose:
![](https://static.packt-cdn.com/products/9781785889622/graphics/assets/da0baf5a-bc3c-46ae-8536-8fc47e0fd33b.png)
So (for simplicity, it accepts a vector containing both variables):
import numpy as np
def loss(v):
e = 0.0
for i in range(nb_samples):
e += np.square(v[0] + v[1]*X[i] - Y[i])
return 0.5 * e
And the gradient can be defined...