Preparing Data for Machine Learning: Feature Selection, Feature Engineering, and Dimensionality Reduction
In this section of the book, we'll be coving machine learning (ML) methods. These methods are used to extract patterns from data, and sometimes predict future events. The data that goes into the algorithms are called features, and we can modify our set of features using feature engineering, feature selection, and dimensionality reduction. We can often improve our ML models dramatically with these methods that we cover here. In this chapter, we'll cover the following topics:
- Feature selection methods, including univariate statistical methods, such as correlation, mutual information score, chi-squared, and other feature selection methods
- Feature engineering methods for categorical data, datetime data, and outliers
- Using mathematical transforms for feature engineering
- Dimensionality reduction using PCA
Let's get started with the...