Structured datasets
Structured datasets include simple delimited files, such as the familiar comma-separated-values (CSV), and files incorporating more descriptive metadata, such as XML and HDF5.
In this section, we will discuss the important topic of Julia’s DataFrames. This will be familiar to all R users. They are also implemented in Python via the pandas
module.
CSV and other delimited (DLM) files
Data is often presented in table form as a series of rows representing individual records and fields corresponding to a data value for that particular record, rather than the relatively unstructured forms we have seen in the previous files.
Columns in the table are consistent, in the sense that they may all be integers, floats, dates, and so on, and are to be considered as the same “class” of data.
This might be familiar to you as it maps directly to the way data is held in a spreadsheet.
CSV file format
One of the oldest forms of representing...