Exploratory Data Analysis
Between the data cleaning phase and the modeling or formal statistical analysis, there exists an intermediate step known as EDA, which is a fundamental aspect of data science. EDA serves as the primary approach to understanding and making sense of a dataset, providing insights into the “population out of the sample” and transforming raw data into actionable information for businesses. EDA can include various techniques and methods:
- Data summary or descriptive statistics: Used to summarize central tendencies within the dataset.
- Data visualization: Graphical techniques such as histograms, box plots, scatter plots, and line plots are employed to visualize the data, aiding in pattern identification, outlier detection, and understanding the relationship between variables. Furthermore, data visualization is particularly effective when presenting conclusions to a non-technical audience.
- Data exploration: Helps us understand the distribution...