Now, it is time to explore the data. There are many questions that we can ask, such as the following:
- What target features would we like to model supporting our goals?
- What are the useful training features for each target feature?
- Which features are not good for modeling since they leak information about target features (see the previous section)?
- Which features are not useful (for example, constant features, or features containing lot of missing values)?
- How to clean up data? What to do with missing values? Can we engineer new features?