What is the goal of EDA?
The objective of EDA is to make sure that the dataset to be used later for more complex processes is first of all clean, that is, it has no missing values and no outliers that could divert possible subsequent analyses. In addition, it is important to select during this phase the variables that actually bring information, trying to drop those that determine mostly noise. This eliminates possible sources of inaccuracy in the conclusions to which subsequent processes lead. At this stage, it is also important to study the associations between variables and gain insights from the data analyzed in order to justify any more complex processing to be applied later.
Ultimately, the phases of EDA are as follows:
- Understanding your data
- Cleaning your data
- Discovering associations between variables
Let's look in detail at what types of analysis they involve.
Understanding your data
In this first phase, it is essential to understand the...