Exploratory data analysis
EDA techniques are used for discovering patterns in the data, summarization, as well as for visualization of the data. It is an essential step in the data analysis process, which helps to formulate various hypotheses about the data.
The EDA techniques shall be broadly classified into three types: univariate, bivariate, and multivariate analysis. Let's implement a few of the EDA techniques on our dataset.
First, let's see what kind of data we are analyzing. Using the function sapply
, we determine the various columns present in the dataset and the datatype of those columns:
sapply(ausersubset, class)
We get the following output:
Note
Note that the preceding screenshot is just a part of the output.
In order to get a basic understanding of the whole dataset, such as the distribution of the values of the columns, we can use the summary
function to get the highlights of the dataset. For example, we will get the minimum, mean, median, maximum, and quartile values for each column...