Writing a custom data sink
As opposed to a data source, a data sink has much less work to do. Actually – in trivial cases – a data sink can be implemented using a plain ParDo
object. In fact, we have already implemented one of these, which was PrintElements
, located in the util
module. The PrintElements
transform can be considered a sink to stderr
, as we can see from this implementation:
@Override public PDone expand(PCollection<T> input) { input.apply(ParDo.of(new LogResultsFn<>())); return PDone.in(input.getPipeline()); } private static class LogResultsFn<T> extends DoFn<T, Void> { @ProcessElement public void process(@Element T elem) { System.err.println(elem); } }
This sink is very simplistic – a real-life solution would need some of the tools we already know. For example, batching RPCs using bundle life cycles via @StartBundle
and @FinishBundle
...