K-nearest neighbors regression
As mentioned previously, K-nearest neighbors can be a good alternative to linear regression when the assumptions of ordinary least squares do not hold, and the number of observations and dimensions is small. It is also very easy to specify, so even if we do not use it for our final model, it can be valuable for diagnostic purposes.
In this section, we will use KNN to build a model of the ratio of female to male incomes at the level of country. We will base this on labor force participation rates, educational attainment, teenage birth frequency, and female participation in politics at the highest level. This is a good dataset to experiment with because the small sample size and feature space mean that it is not likely to tax your system’s resources. The small number of features also makes it easier to interpret. The drawback is that it might be hard to find significant results. That being said, let’s see what we find.
Note
We will...