Aggregating features
When you’re looking at high cardinality features, one possible solution is to reduce the actual cardinality of that feature. Here, aggregating is one possible solution, and it may work very well in some cases. In this recipe, we will explain what aggregating is and discuss when we should use it. Once we’ve done that, we will apply it.
Getting ready
When dealing with high cardinality features, one-hot encoding leads to high-dimensionality datasets. Because of the so-called curse of dimensionality, the ability for models to generalize properly can be a real issue for one-hot encoded high cardinality features, even with very large training datasets. Thus, aggregating is a way to lower the dimensionality of the one-hot encoding, and then lower the risk of overfitting.
There are several ways to aggregate. Let’s, for example, assume that we have a database of clients that contains the “phone model” feature, which consists of...