Chapter 2: Exploratory Analysis of Data
Activity 4: Plotting Multiple Density Plots and Boxplots
First, load the necessary libraries and packages in the RStudio:
library(ggplot2) library(cowplot)
Read the bank-additional-full.csv dataset in a DataFrame named df:
df <- read.csv("bank-additional-full.csv",sep=';')
Define the plot_grid_numeric function for density plot:
plot_grid_numeric <- function(df,list_of_variables,ncols=2){ plt_matrix<-list() i<-1 for(column in list_of_variables){ plt_matrix[[i]]<-ggplot(data=df,aes_string(x=column)) + geom_density(fill="red",alpha =0.5) + ggtitle(paste("Density Plot for variable:",column)) + theme_bw() i<-i+1 } plot_grid(plotlist=plt_matrix,ncol=2) }
Plot the density plot for the campaign, pdays, previous, and emp.var.rate variables:
plot_grid_numeric(df,c("campaign","pdays","previous","emp.var.rate"),2)
The output is as follows:
Observe that the interpretations we obtained using the histogram are visibly true in density plot as well. Hence, this serves as another alternative plot for looking at the same trend.
Repeat step 4 for boxplot:
plot_grid_numeric <- function(df,list_of_variables,ncols=2){ plt_matrix<-list() i<-1 for(column in list_of_variables){ plt_matrix[[i]]<-ggplot(data=df,aes_string(y=column)) + geom_boxplot(outlier.colour="black") + ggtitle(paste("Boxplot for variable:",column)) + theme_bw() i<-i+1 } plot_grid(plotlist=plt_matrix,ncol=2) } plot_grid_numeric(df,c("campaign","pdays","previous","emp.var.rate"),2)
The output is as follows:
Now, let's explore the last four numeric variable of the dataset, that is, nr.employed, euribor3m, cons.conf.index, and duration, and see whether we could derive some meaningful insights.