In this chapter, we will present typical use cases for using Spark SQL in streaming applications. Our focus will be on structured streaming using the Dataset/DataFrame APIs introduced in Spark 2.0. Additionally, we will introduce and work with Apache Kafka, as it is an integral part of many web-scale streaming application architectures. Streaming applications typically involve real-time, context-aware responses to incoming data or messages. We will use several examples to illustrate the key concepts and techniques to build such applications.
In this chapter, we will learn these topics:
- What is a streaming data application?
- Typical streaming use cases
- Using Spark SQL DataFrame/Dataset APIs to build streaming applications
- Using Kafka in Structured Streaming applications
- Creating a receiver for a custom data source