Using PCA to graph multi-dimensional data
So far, we've been limiting ourselves to two-dimensional data. After all, the human mind has a lot of trouble dealing with more than three dimensions, and even two-dimensional visualizations of three-dimensional space can be difficult to comprehend.
However, we can use PCA to help. It projects higher-dimensional data down to lower dimensions, but it does this in a way that preserves the most significant relationships in the data. It re-projects the data on a lower dimension in a way that captures the maximum amount of variance in the data. This makes the data easier to visualize in three- or two-dimensional space, and it also provides a way to select the most relevant features in a dataset.
In this recipe, we'll take the data from the US census by race that we've worked with in previous chapters, and create a two-dimensional scatter plot of it.
Getting ready
We'll use the same dependencies in our project.clj
file as we did in Creating...