Practical Case Study 2: NYC Yellow Taxi Trip Analysis
In this case study, we will incrementally develop another script to process data. For this example, we will deal with a much larger dataset than the previous one.
Note
The kind of operations we will attempt on the data here are more complex than those in the previous study. In particular, we will process every line of the file individually in complex ways. Sometimes, it is better to use some external tools such as awk
or even a Python script for this process since the shell has its limits in terms of performance, especially when we do not use pipelines. This example tries more to demonstrate how to program with the shell and does not suggest that the student should always blindly use only the shell.
Understanding the Dataset
The dataset we will use for this is a text file that contains public data about yellow taxi trips in New York City for 2017. We will use a subset of 200,000 lines of that data for this book...