Summary
We could have provided streaming examples for other systems as well, but there was no room in this chapter. Twitter streaming has been examined by example in the Checkpointing section. This chapter has provided practical examples of data recovery via checkpointing in Spark Streaming. It has also touched on the performance limitations of checkpointing and shown that the checkpointing interval should be set at five to ten times the Spark stream batch interval.
Checkpointing provides a stream-based recovery mechanism in the case of Spark application failure. This chapter has provided some stream-based worked examples for TCP, File, Flume, and Kafka-based Spark stream coding. All the examples here are based on Scala and compiled with sbt
. In case you are more familiar with Maven the following tutorial explains how to set up a Maven based Scala project: http://www.scala-lang.org/old/node/345.