Summary stats
We will now cover some basic measures of central tendency, dispersion, and simple plots. The first question that we will address is how R handles the missing values in calculations? To see what happens, create a vector with a missing value (NA
in the R language), then sum the values of the vector with sum()
:
> a = c(1,2,3,NA) > sum(a) [1] NA
Unlike SAS, which would sum the non-missing values, R does not sum the non-missing values but simply returns that at least one value is missing. Now, we could create a new vector with the missing value deleted but you can also include the syntax to exclude any missing values with na.rm=TRUE
:
> sum(a, na.rm=TRUE) [1] 6
Functions exist to identify the measures of central tendency and dispersion of a vector:
> data = c(4,3,2,5.5,7.8,9,14,20) > mean(data) [1] 8.1625 > median(data) [1] 6.65 > sd(data) [1] 6.142112 > max(data) [1] 20 > min(data) [1] 2 > range(data) [1] 2 20 > quantile(data) 0% 25%...