In the previous chapter, we learned about Apache Spark, a near real-time processing engine which can process data in micro batches. But when it comes to very low latency applications, where seconds of delay may cause big trouble, Spark may not be a good fit for you. You would need a framework which can handle millions of records per second and you would want to process record by record, instead of processing in batches, for lower latency. In this chapter, we will learn about the real-time processing engine, Apache Storm. Storm was first designed and developed by Twitter, which later became an open source Apache project.
In this chapter, we will learn about:
- Introduction to Apache Storm
- Apache Storm architecture
- Brief overview of Apache Heron
- Integrating Apache Storm with Apache Kafka (Java/Scala example)
- Use case (log processing)