Understanding the Science Behind EDA
In layman's terms, we can define EDA as the science of understanding data. A more formal definition is the process of analyzing and exploring datasets to summarize its characteristics, properties, and latent relationships using statistical, visual, analytical, or a combination of techniques.
To cement our understanding, let's break down the definition further. The dataset is a combination of numeric and categorical features. To study the data, we might need to explore features individually, and to study relationships, we might need to explore features together. Depending on the number of features and the type of features, we may cross paths with different types of EDA.
To simplify, we can broadly classify the process of EDA as follows:
Univariate analysis: Studying a single feature
Bivariate analysis: Studying the relationship between two features
Multivariate analysis: Studying the relationship between more than two features
For now, we will restrict the scope...