Chapter 14: Exploratory Data Analysis
In Chapter 13, Using Machine Learning without Premium or Embedded Capacity, we mentioned that using Auto Machine Learning (AutoML) solutions on a dataset blindly often does not lead to very accurate models. This is because it is necessary to understand the most inherent characteristics of the dataset by using statistical tools at an earlier stage to extract useful information in order to get a better model.
The approach to be used for this type of dataset analysis is called Exploratory Data Analysis (EDA) and was first introduced by John Turkey to encourage statisticians to explore data and formulate hypotheses that would lead to new data collection and experiments to eventually enrich patterns among the variables in a dataset.
In this chapter, you will learn about the following topics:
- What is the goal of EDA?
- EDA with Python and R
- EDA in Power BI