Connecting Spark streams and Kafka
Apache Spark is an open source computer framework. Spark's in-memory processing performs up to 100 times faster for certain traditional applications. It is used for making distributed real-time data analytics. Spark has very good integration with Kafka for reading and writing data processed by Kafka.
Getting ready
For this recipe a running Kafka cluster is needed. To install Apache Spark follow the instructions on this page:Â https://spark.apache.org/downloads.html.
How to do it...
Spark has a simple utility class to create the data stream to be read from Kafka.
- The first thing in any Spark project is to create Spark configuration and the Spark streaming context:
SparkConf sparkConf = new SparkConf().setAppName("KafkaSparkTest"); JavaStreamingContext jssc = new JavaStreamingContext(sparkConf, Durations.seconds(10));
- Then, create the
HashSet
for the topic and the Kafka consumer parameters:
HashSet<String> topicsSet = new HashSet<String>(); topicsSet...