Loading the dataset to RStudio
First, we need to load the libraries to be used in this EDA. We are going to need the libraries loaded as follows. As seen in Chapter 8, the tidyverse
package is composed of eight core libraries, being a robust tool to work with data in R. skimr
will be useful for the descriptive statistics calculations and display, and statsr
is a great library that brings us many statistical tools, which we will be using to help with data sampling, more specifically. GGally
is used for pair plots and corrplot
for correlation plots:
library(tidyverse) library(skimr) library(statsr) library(GGally) library(corrplot)
To load the dataset to RStudio, we can use the read_csv()
function. As we have seen many times so far, that function is able to read CSV files directly from a web address, so that is what we will do in the subsequent code. We define a url
variable with the website address and add that to the reading function:
# Path url <- "https://raw.githubusercontent...