Understanding exploratory data analysis
When you first work with a dataset, you don’t usually start with much knowledge of the data it contains. In order to understand data, data analysts and data scientists perform exploratory data analysis (EDA).
EDA is the process of using descriptive statistics and data visualization tools to understand the characteristics of your dataset.
The numbers and visuals behind EDA will help you understand things such as the following:
- Does the data have any outliers?
- Are there any missing values anywhere?
- Does my data feature clusters of similar values or a broad spectrum of values spread throughout the landscape?
- Do values in one column imply anything about probable values in another column?
All of these questions can help identify missing pieces of knowledge that can help with the tasks you’re trying to accomplish. Additionally, what you discover during EDA can guide your data wrangling by identifying outliers...