Linear regression
We start by looking at the most simple and most used model in statistics, which consists of fitting a straight line to a dataset. We assume we have a data set of pairs (xi, yi) that are i.i.d and we want to fit a model such that:
y = βx +β0 + ϵ
Here, ϵ is a Gaussian noise. If we assume that xi ϵ ℝn then the expected value can also be written as:
Or, in matrix notation, we can also include the intercept β0 into the vector of parameters and add a column on 1 in X, such that X = (1, x1, …, xn) to finally obtain:
ŷ = XTβ
The following figure shows an example (in one dimension) of a data set with its corresponding regression line:
In R, fitting a linear model is an easy task, as we will see now. Here, we produce a small data set with an artificial number, in order to reproduce the previous figure. In R, the function to fit a linear model is lm()
and it is the workhorse of this language in many situations. Of course, later...