Summary
In this chapter, we took a second stab at unsupervised learning techniques by exploring PCA, examining what it is, and applying it in a practical fashion. We explored how it can be used to reduce the dimensionality and improve the understanding of the dataset when confronted with numerous highly correlated variables. Then, we applied it to real data from the National Hockey League, using the resulting principal components in a regression analysis to predict total team points. Additionally, we explored ways to visualize the data and the principal components.
As an unsupervised learning technique, it requires some judgment along with trial and error to arrive at an optimal solution that is acceptable to business partners. Nevertheless, it is a powerful tool to extract latent insights and to support supervised learning.
We will next look at using unsupervised learning to develop market basket analyses and recommendation engines in which PCA can play an important role.