About the sample data
The data we will be using in this chapter is a large dataset, downloaded from a U.S. Government "public data" website at http://www.transtats.bts.gov/DL_SelectFields.asp?Table_ID=236&DB_Short_Name=On-Time.
The data concerns commercial flights inside the USA, starting in 1998 and going right through to the present day.
This is the same data we glimpsed back in Chapter 1, So, What Is This SAP HANA Thing Anyways?, which demonstrated the relative speed of SAP HANA compared to other database systems.
We'll only be looking at the data for 1988, which represents over 5 million lines. This data for 1988, initially 12 files (one per month) has been compiled into one single CSV file of just over 1.2 GB of raw data.
Note
As explained in Chapter 1, So, What Is This SAP HANA Thing Anyways?, the sheer size of the data (57 MB compressed for 1988 alone) means it is not possible to distribute the dataset with the book. However, you can download it yourself from the given URL; it's publicly...