Some simple statistics
DataFrames are especially useful in the new compendium discipline commonly termed data science. Both Python and R are frequently seen as its cornerstones but with the new application of Julia’s DataFrames modules, extensive plotting options (see Chapter 8), and the addition of the parallel analytical engine JuliaDB (see Chapter 9), Julia presents a really exciting (and fast) alternative.
In this section, we will look at the application of some simple statistics involving data sources from the RDatasets
package:
julia>
mlmf = dataset("mlmRev","Gcsemv"); size(mlmf)
(1905, 5)
We will use data from mlmRev
, which is a group of datasets from the Multilevel Software Review. The Gcsemv
dataset refers to the UK’s GSCE exam scores.
This covers the results from 73 schools both in terms of examination and coursework. The data is not split by subject (only school and pupil) but the gender of the student is provided. Schools...