Task 15 – Implementing SchemaSportTracker
In this section, we will reimplement a task from Chapter 2, Implementing, Testing, and Deploying Basic Pipelines. We have included this to learn how to overcome some limitations of SQL when using schemas – notably, the (current) inability to perform aggregation (UDAF) using multiple fields. In our computation, we need to aggregate a composite (a Row
) that has three fields – latitude
, longitude
, and timestamp
.
Again, for clarity, let's recap the definition of our problem.
Problem definition
Given a stream of GPS locations and timestamps for a workout of a specific user (a workout has an ID that is guaranteed to be unique among all users), compute the performance metrics for each workout. These metrics should contain the total duration and distance elapsed from the start of the workout to the present.
Problem decomposition discussion
The actual business logic of computing the distance from GPS location...