The goal
The goal of this example is to process the historical New York City Yellow Cab trip data and simulate it as a real-time feed. Each entry of the source data contains the pickup time, pickup latitude-longitude coordinate, passenger count, trip distance, drop-off time, drop-off latitude-longitude coordinate, total fare amount, and many other fields.
We want to implement a simple pipeline that processes this data, and advises a taxi driver looking for passengers (the user of this application), to drive toward a direction, while maximizing the chance of passenger pickup and the fare they will get, based on the data in real time.