Storm is an open source, distributed, resilient, real-time processing engine. It was started by Nathan Marz in late 2010. He was working at BackType. On his blog, he mentioned the challenges he faced while building Storm. It is a must read: http://nathanmarz.com/blog/history-of-apache-storm-and-lessons-learned.html.
Here is the crux of the whole blog: initially, real-time processing was implemented like pushing messages into a queue and then reading the messages from it using Python or any other language and processing them one by one. The challenges with this approach are:
- In case of failure of the processing of any message, it has to be put back into the queue for reprocessing
- Keeping queues and the worker (processing unit) up and running all the time
What follows are two sparking ideas by Nathan that make Storm capable of being a highly reliable and real...