We have gone through all the components of Apache Kafka and different APIs that can be used to develop an application which can use Kafka. In the previous chapter, we learned about Kafka producer, brokers, and Kafka consumers, and different concepts related to best practices for using Kafka as a messaging system.
In this chapter, we will cover Apache Spark, which is distributed in memory processing engines and then we will walk through Spark Streaming concepts and how we can integrate Apache Kafka with Spark.
In short, we will cover the following topics:
- Introduction to Spark
- Internals of Spark such as RDD
- Spark Streaming
- Receiver-based approach (Spark-Kafka integration)
- Direct approach (Spark-Kafka integration)
- Use case (Log processing)