Introduction
Say we have a problem statement that involves predicting whether a particular earthquake caused a tsunami. How do we decide what model to use? What do we know about the data we have? Nothing! But if we don't know and understand our data, chances are we'll end up building a model that's not very interpretable or reliable. When it comes to data science, it's important to have a thorough understanding of the data we're dealing with, in order to generate features that are highly informative and, consequently, to build accurate and powerful models. To acquire this understanding, we perform an exploratory analysis of the data to see what the data can tell us about the relationships between the features and the target variable (the value that you are trying to predict using the other variables). Getting to know our data will even help us interpret the model we build and identify ways we can improve its accuracy. The approach we take to achieve this is...