Reading Kafka with Apache Beam
According to the definition, Apache Beam is an open source unified programming model to define and execute data processing pipelines, including ETL, batch, and stream processing. This recipe shows how to read Kafka with Apache Beam.
Getting ready
To install Apache Beam, follow the instructions at: https://beam.apache.org/get-started/quickstart-py/.
How to do it...
The following code shows how to write a Beam pipeline to read from Kafka. The example illustrates various options for configuring the Beam source:
pipeline .apply(KafkaIO.read() .withBootstrapServers("broker_1:9092,broker_2:9092") .withTopics(ImmutableList.of("topic_1", "topic_2")) .withKeyCoder(BigEndianLongCoder.of()) .withValueCoder(StringUtf8Coder.of()) .updateConsumerProperties( ImmutableMap.of("receive.buffer.bytes", 1024 * 1024)) .withTimestampFn(new CustomTypestampFunction()) .withWatermarkFn...