4.5 Robust regression
I once ran a complex simulation of a molecular system. At each step of the simulation, I needed it to fit a linear regression as an intermediate step. I had theoretical and empirical reasons to think that my Y was conditionally Normal given my Xs, so I decided simple linear regression should do the trick. But from time to time the simulation generated a few values of Y that were way above or below the bulk of the data. This completely ruined my simulation and I had to restart it.
Usually, these values that are very different from the bulk of the data are called outliers. The reason for the failure of my simulations was that the outliers were pulling the regression line away from the bulk of the data and when I passed this estimate to the next step in the simulation, the thing just halted. I solved this with the help of our good friend the Student’s t-distribution, which, as we saw in Chapter 2, has heavier tails than the Normal distribution. This means that...