Introducing EDA
Exploratory Data Analysis (EDA) is one of the preliminary steps in a data science project life cycle. It enables us to understand our data in order to extract meaningful information from it. Through EDA, we can understand the underlying structure in the data.
We can think about the EDA phase as a small data science project, in which the real data analysis part (model definition and evaluation) is missing. Therefore, a typical EDA process is composed of the steps shown in the following figure:
Figure 2.1 – The main steps of an EDA process
The previous figure shows that an EDA process is composed of the following steps:
- Problem setting
- Data preparation
- Preliminary data analysis
- Preliminary results
Let's investigate each step separately, starting from the first step – problem setting.
Problem setting
Problem setting is the capability to define which kind of questions our dataset can answer...