Working with DataFrames and tidy data
To work within the grammar of graphics, we need data. But we do not need any data; we need tidy data. Tidy data is data that’s been arranged in a tabular way, in which each row of the table represents an observation, and each column represents a variable. This layout is essential as we usually want to map data variables, and therefore columns, to geometry aesthetics. Usually, we use the DataFrame data structure to represent and store this kind of data. The DataFrames
package defines this structure for the Julia language and exports many valuable functions to work with it. We are going to explore this package in this section.
DataFrames are usually stored using text files in Comma-Separated Values (CSV) format; therefore, we typically need the CSV
package to load them. There is also a series of helpful datasets stored in the RDatasets
and VegaDatasets
packages that we will use for demonstration purposes throughout this book. Now, let&...