A bidimensional example
Let's consider a small dataset built by adding some uniform noise to the points belonging to a segment bounded between -6 and 6. The original equation is: y = x + 2 + n, where n is a noise term.
In the following figure, there's a plot with a candidate regression function:

As we're working on a plane, the regressor we're looking for is a function of only two parameters:

In order to fit our model, we must find the best parameters and to do that we choose an ordinary least squares approach. The loss function to minimize is:

With an analytic approach, in order to find the global minimum, we must impose:

So (for simplicity, it accepts a vector containing both variables):
import numpy as np def loss(v): e = 0.0 for i in range(nb_samples): e += np.square(v[0] + v[1]*X[i] - Y[i]) return 0.5 * e
And the gradient can be defined as:
def gradient(v): g = np.zeros(shape=2) for i in range(nb_samples): g[0] += (v[0] + v[1]*X[i] - Y[i]) g[1] += ((v[0] + v...