There are several connectors for Apache Spark. In this case, we are using the Databricks Inc. (the company responsible for Apache Spark) connector for Kafka.
Using this Spark Kafka connector, we can read data with Spark Structured Streaming from a Kafka topic:
Dataset<Row> inputDataset = spark
.readStream()
.format("kafka")
.option("kafka.bootstrap.servers", brokers)
.option("subscribe", Constants.getHealthChecksTopic())
.load();
Simply by saying Kafka format, we can read a stream from the topic specified in the subscribe option, running on the brokers specified.
At this point in the code, if you invoke the printSchema() method on the inputDataSet, the result will be something similar to Figure 8.1:
Figure 8.1: Print schema output
We can interpret this as follows:
- The key and the value are binary...