Introduction
Say we have a problem statement that involves predicting whether a particular earthquake caused a tsunami or not. How do we decide what model to use? What do we know about the data we have? Nothing! But if we don't know and understand our data, chances are we'll end up building a model that's not very interpretable or reliable.
When it comes to data science, it's important to have a thorough understanding of the data we're dealing with, in order to generate features that are highly informative and, consequently, to build accurate and powerful models.
In order to gain this understanding, we perform an exploratory analysis on the data to see what the data can tell us about the relationships between the features and the target variable. Getting to know our data will even help us interpret the model we build and identify ways we can improve its accuracy.
The approach we take to achieve this is to allow the data to reveal its structure or model, which helps gain some new, often unsuspected...