Exploratory Data Analysis
In Chapter 17, we discussed the challenges of using machine learning without premium or embedded capacity. One of the key pitfalls we highlighted was blindly applying automated machine learning (AutoML) solutions to a dataset, which often results in inaccurate models. To overcome this limitation, a critical step is to gain a deep understanding of the inherent characteristics of the dataset.
To accomplish this, this chapter introduces the concept of exploratory data analysis (EDA). This approach to analysis, pioneered by John Tukey, encourages statisticians to thoroughly explore the data and formulate hypotheses. By doing so, we can extract valuable information that ultimately enhances our understanding of the dataset and leads to the discovery of meaningful patterns among the variables.
By using EDA techniques, you can make informed decisions when selecting the most appropriate machine learning models and feature engineering methods. This chapter...