Joining data from multiple tables
We have learned the basics of data wrangling with DuckDB. Now, let’s move on to a more detailed example. We’re going to combine datasets from multiple tables in real-world taxi passenger-trip data, in order to analyze passenger movement and tipping behavior.
New York taxi data
The NYC Taxi and Limousine Commission provides a collection of data that contains information about the trips taken by yellow and green taxis in New York City. The dataset includes variables such as pickup and drop-off locations, dates and times, passenger counts, trip distances, fares, tips, tolls, and payment types. The dataset is publicly available and is interesting for analyzing traffic patterns and evaluating tipping behavior.
The dataset is updated monthly and can be accessed through the New York City website (https://www.nyc.gov/site/tlc/about/tlc-trip-record-data.page). For this exercise, we will be using the data from January 2023 for yellow taxi...