Aggregated view of various features
We explored the categorical, numerical data as well as the text data. We learned how to extract various features from text data, and we build aggregated features from some of the numerical ones. Let’s now build two more features by grouping the Title and the Family Size. We will create two new features:
- Titles – by clustering together titles that are similar (like Miss. with Mlle. or Mrs. and Mme.) or rare (like Dona., Don., Capt., Jonkheer., Rev., the Countess.) and keeping the most frequent ones: Mr., Mrs., Master. And Miss.
- Family Type – create three clusters from the Family Size values, Single for Family Size of 1, Small (for families to up to 4 members) and Large (for families with more than 4 members)
Then, we represent on a single graph several simple or derived features that we learned have an important predictive value (see Figure 3.26).