Feature extraction and principal component analysis
Sometimes we have an overwhelming number of columns and likely not enough rows to handle the great quantity of columns.
A great example of this is when we were looking at the send cash now
example in our Naïve Bayes example. We had literally 0 instances of texts with that exact phrase, so instead we turned to a naïve assumption that allowed us to extrapolate a probability for both of our categories.
The reason we had this problem in the first place is because of something called the curse of dimensionality.
The curse of dimensionality basically says that as we introduce and consider new feature columns, we need almost exponentially more rows (data points) in order to fill in the empty spaces that we create.
Consider an example where we attempt to use a learning model that utilizes the distance between points on a corpus of text that has 4,086 pieces of text, and that the whole thing has been Countvectorized
. Let's assume that...