Distributed data sources
The JuliaData group includes a package called JuliaDB, which was heralded as a package for working with large persistent datasets. However, the GitHub pages state that there’s a caveat – it is now unmaintained and has been (at the time of writing) for over 2 years, which corresponds to version 1.4.x.
It suggests that it is preferable to use the DTables
package instead, which is what we are going to do here.
To move away from some of the datasets we have used previously, I am going to look at some statistics from Football (soccer for those in the US).
We will need a few packages that need to be available and can be defined in the Project.toml
file for this chapter. They can be accessed in the usual way:
julia>
import Pkg; Pkg.activate(".")julia>
using Distributions, StatsBase, OnlineStatsjulia>
using DataFrames, DTables, Query, CSV, Printf
In the CSV folder of the DataSources
directory are a couple of files referring...