Reducing the dimensionality of a dataset with a principal component analysis
In the previous recipes, we presented supervised learning methods; our data points came with discrete or continuous labels, and the algorithms were able to learn the mapping from the points to the labels.
Starting with this recipe, we will present unsupervised learning methods. These methods might be helpful prior to running a supervised learning algorithm. They can give a first insight into the data.
Let's assume that our data consists of points without any labels. The goal is to discover some form of hidden structure in this set of points. Frequently, data points have intrinsic low dimensionality: a small number of features suffice to accurately describe the data. However, these features might be hidden among many other features not relevant to the problem. Dimension reduction can help us find these structures. This knowledge can considerably improve the performance of subsequent supervised learning algorithms...