Given the observation matrix and a real value label, we are initially tempted to approach the problem as a regression problem. In this case, the regression is very simple: from a numerical vector, we want to predict a numerical value. That's not ideal. Treating the problem as a regression problem, we force the algorithm to think that each feature is independent, while instead, they're correlated, since they're windows of the same timeseries. Let's start anyway with this simple assumption (each feature is independent), and we will show in the next chapter how performance can be increased by exploiting the temporal correlation.
In order to evaluate the model, we now create a function that, given the observation matrix, the true labels, and the predicted ones, will output the metrics (in terms of mean square...