Removing redundant or unhelpful features
During the process of data cleaning and manipulation, we often end up with data that is no longer meaningful. Perhaps we subsetted data based on a single feature value and we have retained that feature, even though it now has the same value for all observations. Alternatively, for the subset of the data that we are using, two features have the same value. Ideally, we catch those redundancies during our data cleaning. However, if we do not catch them during that process, we can use the open source feature-engine
package to help us with that.
There also may be features that are so highly correlated that it is very unlikely that we could build a model that could use all of them effectively. feature-engine
has a method, DropCorrelatedFeatures
, that makes it easy to remove a feature when it is highly correlated with another feature.
Getting ready
We will work extensively with the feature-engine
and category_encoders
packages in this...