Joining streaming data with static data in Apache Spark Structured Streaming and Delta Lake
In this recipe, you will learn how to join streaming data with static data in Apache Spark Structured Streaming and Delta Lake. This is a common use case for many applications that need to enrich streaming data with additional information from a historical or reference dataset. For example, you may want to join a stream of user events with a static table of user profiles, or a stream of orders with a static table of product details.
Getting ready
Before we start, we need to make sure that we have a Kafka cluster running and a topic that produces some streaming data. For simplicity, we will use a single-node Kafka cluster and a topic named orders
. Open the 5.0 orders-gen-kafka.ipynb
notebook and execute the cell. This notebook simulates streaming data of online orders, which contains the order ID, the product ID, the quantity, and the timestamp.
Make sure you have run this notebook and...