Chapter 9. Principal Components Analysis
"Some people skate to the puck. I skate to where the puck is going to be." | ||
--Wayne Gretzky |
This chapter is the second one where we will focus on the unsupervised learning techniques. In the prior chapter, we covered cluster analysis, which provides us with the groupings of similar observations. In this chapter, we will see how to reduce the dimensionality and improve the understanding of our data by grouping the correlated variables with Principal Components Analysis (PCA). Then, we will use the principal components in supervised learning.
In many datasets, particularly in the social sciences, you will see many variables highly correlated with each other. It may additionally suffer from high dimensionality or, as it is known, the curse of dimensionality. This is a problem because the number of samples needed to estimate a function grows exponentially with the number of input features. In such datasets, there may...