Spark Streaming transformations and actions
Transformations and actions on DStream boils down to transformations and actions on RDDs. The DStream API has many of the transformations available on normal RDD API with special functions applicable for streaming applications. Let's go through some of the important transformations.
Union
Two DStreams can be combined to create one DStream. For example, data received from multiple receivers of Kafka or Flume can be combined to create a new DStream. This is a common approach in Spark Streaming to increase scalability:
stream1 = ... stream2 = ... MultiDStream = stream1.union(stream2)
Join
Joins two DStreams of (K, V) and (K, W) pairs and returns a new DStream of (K, (V, W)) pairs with all pairs of elements for each key:
stream1 = ... stream2 = ... joinedDStream = stream1.join(stream2)
Transform operation
The transform
operation can be used to apply any RDD operation that is not available in the DStream API. For example, joining a DStream with a dataset is...