Exploring and visualizing the data
The exploration phase of an EDA is the main portion of it, naturally. In this section, the idea is to take a thorough look at the variables, understand their distributions, start creating some questions that will lead the exploration, and use the data to answer them.
Univariate analysis
The first step to take concerns univariate analysis—looking at one variable at a time. The best approach is to create some histograms to look at the distribution of the variables. According to Hair Jr. et al. (2019), plotting the variables’ distributions and looking at their shape is a good point to start understanding the nature of those variables.
In the next code snippet, we will loop through all the numeric variables and plot one histogram for each. It starts with a for
loop to iterate over each variable in the column names that presents the numeric type (there is the importance of knowing the data types, from previous sections). If it is...