This chapter is the second one where we will focus on unsupervised learning techniques. In the previous chapter, we covered cluster analysis, which provides us with the groupings of similar observations. In this chapter, we will see how to reduce the dimensionality and improve the understanding of our data by grouping the correlated variables with principal components analysis (PCA). Then, we will use the principal components in supervised learning.
In many datasets, particularly in the social sciences, you will see many variables highly correlated with each other. They may additionally suffer from high-dimensionality or, as it is better known, the curse of dimensionality. This is a problem because the number of samples needed to estimate a function grows exponentially...