Summary
In this chapter, we covered the basics of feature selection and feature engineering. We also covered some basic feature cleaning and preparation steps, such as converting strings to dates and checking for and cleaning outliers. We ended with a dimensionality reduction technique known as Principal Component Analysis (PCA), which can be used to linearly combine features into PCA dimensions we can use for later analysis.
The chapter began by introducing the three main types of ML: supervised, unsupervised, and reinforcement learning. Then we learned why it's important to prune down our features: the curse of dimensionality and overfitting. When we have a lot of features, this can lead to problems, including ML model overfitting (with supervised learning), where the model fits to noise in the data. Feature selection can be used to remove some features and reduce this chance for overfitting, as well as making our models run faster. We saw several ways to undergo feature...