Feature engineering
Feature engineering can constitute a large portion of a data scientist’s activities, and it can be just as important to their success, or sometimes even more important, than choosing the right machine learning algorithm. In this section, we will dive deeper into feature engineering, which can be considered both an art and a science.
We will use the Titanic dataset available on OpenML (https://www.openml.org/search?type=data&sort=runs&id=40945) for our examples in this section. This dataset contains information about passengers aboard the Titanic, including demographic data, ticket class, fare, and whether they survived the sinking of the ship.
In the Chapter-07
directory in JupyterLab on your Vertex AI Workbench Notebook Instance, open the feature-eng-titanic.ipynb
notebook and choose Python (Local) as the kernel. Again, run each cell in the notebook by selecting the cell and pressing Shift + Enter on your keyboard.
In this notebook, the...