Robust linear regression
Assuming our data follows a Gaussian distribution is perfectly reasonable in many situations. By assuming Gaussianity, we are not necessarily saying that our data is really Gaussian; instead we are saying that it is a reasonable approximation for our current problem. As we saw in the previous chapter, sometimes this Gaussian assumption fails, for example in the presence of outliers. We learned that using a Student's t-distribution is a way to effectively deal with outliers and get a more robust inference. The very same idea can be applied to linear regression.
To exemplify the robustness that a Student's t-distribution brings to a linear regression we are going to use a very simple and nice dataset: the third data group from the Anscombe quartet. If you do not know what the Anscombe quartet is, remember to check it later at Wikipedia. We can upload it from seaborn
:
ans = sns.load_dataset('anscombe') x_3 = ans[ans.dataset == 'III']['x'].values y_3 = ans[ans.dataset...