Feature extraction with PCA
PCA can be used for dimension reduction in preparation for a model we will run subsequently. Although PCA is not, strictly speaking, a feature selection tool, we can run it in pretty much the same way we ran the wrapper feature selection methods in Chapter 5, Feature Selection. After some preprocessing (such as handling outliers), we generate the components, which we can then use as our new features. Sometimes we do not actually use these components in a model. Rather, we generate them mainly to help us visualize our data better.
To illustrate the use of PCA, we will work with data on National Basketball Association (NBA) games. The dataset has statistics from each NBA game from the 2017/2018 season through the 2020/2021 season. This includes the home team; whether the home team won; the visiting team; shooting percentages for visiting and home teams; turnovers, rebounds, and assists by both teams; and a number of other measures.
Note
NBA game data...