Removing redundant or unhelpful features
During the process of data cleaning and manipulation, we often end up with data that is no longer meaningful. Perhaps we subsetted data based on a single feature value, and we have retained that feature even though it now has the same value for all observations. Or, for the subset of the data that we are using, two features have the same value. Ideally, we catch those redundancies during our data cleaning. However, if we do not catch them during that process, we can use the open source feature-engine
package to help us.
Additionally, there might be features that are so highly correlated that it is very unlikely that we could build a model that could use all of them effectively. feature-engine
has a method, DropCorrelatedFeatures
, that makes it easy to remove a feature when it is highly correlated with another feature.
In this section, we will work with land temperature data, along with the NLS data. Note that we will only load temperature...