Basic statistical summaries
Although, we are currently using RDatasets, about which we have sufficient details and documentation, these methods and techniques can be extended to other datasets.
Let's use a different dataset:
We are using another dataset from the RDatasets package. These are exam scores from Inner London. To get some information about the dataset, we will use the describe()
function, which we have already discussed in previous chapters:
The columns are described as follows:
Length
refers to the number of records (rows).Type
refers to the data type of the column. Therefore,School
is of theĀPooled ASCIIString
data type.NA
andNA%
refer to the number and percentage of theNA
values present in the column. This is really helpful as you don't need to manually check for missing records now.Unique
refers to the number of unique records present in the column.Min
andMax
are the minimum and maximum values present in the column (this does not apply to columns havingASCIIStrings...