Summary
In this chapter, we went over an EDA project, beginning with the load of the data to RStudio up to an analysis report.
After loading the data, we started to understand the shape of the dataset and the data types, and we did a transformation of some variables to factor
. Moving on, we cleaned the data of missing values and started the exploration and visualization part. This began with a checkup of the descriptive statistics, then we looked at the distributions of the data and outlier detection. The sequence was to look at a bivariate chart and a pair plot that shows the correlations and scatterplots, allowing one to understand the relationship between the variables and start to get a feel of the best ones for modeling.
Next, we started to ask questions to lead our exploration, always answering them with data and statistical tests. Finally, closing the chapter, we presented an analysis report example, highlighting the findings in text form.