Feature extraction and PCA
A common problem when working with data, particularly when it comes to ML, is having an overwhelming number of columns and not enough rows to handle such a quantity of columns.
A great example of this is when we were looking at the send cash now example in our naïve Bayes example earlier. Remember we had literally 0 instances of texts with that exact phrase? In that case, we turned to a naïve assumption that allowed us to extrapolate a probability for both of our categories.
The reason we had this problem in the first place is because of something called the curse of dimensionality (COD). The COD basically says that as we introduce new feature columns, we need exponentially more rows (data points) to consider the increased number of possibilities.
Consider an example where we attempt to use a learning model that utilizes the distance between points on a corpus of text that has 4,086 pieces of text and that the whole thing has been count...