In this chapter, we examined the problem of dimensionality reduction. High-dimensional data suffers from the curse of dimensionality; estimators require many samples to be learned to generalize from high-dimensional data. We mitigated these problems using a technique called PCA, which reduces a high-dimensional, possibly correlated dataset to a lower dimensional set of linearly uncorrelated principal components by projecting the data onto a lower dimensional subspace. We used principal component analysis to visualize the four-dimensional iris dataset in two dimensions, and to build a face recognition system.
This chapter concludes the book. We have discussed a variety of models, learning algorithms, and performance measures, as well as their implementations in scikit-learn. In the first chapter, we described machine learning programs as those that learn from experience...