Task 6 – Using an external service for data augmentation
All of the tasks we solved so far in the previous chapter had all of their data readily available in the input PCollection
object. That might not be the case in all situations. Imagine a situation in which you need to augment your input data with some metadata that is located behind an external service. This external service is accessible via a Remote Procedure Call (RPC), as illustrated in the following figure:
We feed our input data to a (stateless) operation, which performs an RPC call for each input element (possibly doing some caching) and uses this outcome to somehow modify the input element and output (or discard) it to downstream processing. From this description, we will create a definition of the task problem.
Defining the problem
Given an input stream of lines of text (coming from Apache Kafka) and an RPC service that...