Introducing the lambda architecture
To the best of my knowledge, Nathan Martz, author of Apache Storm, first introduced the lambda architecture in a 2011 blog post. You can read the post yourself at http://nathanmarz.com/blog/how-to-beat-the-cap-theorem.html. In this post, Nathan proposes a new type of system that can calculate historical views of large datasets alongside a real-time layer that can answer queries for real or near-real-time data. He labels these two layers the batch layer and the real-time layer.
The Lambda architecture was derived from trying to solve the problem of answering queries for data that is continuously updated. It's important to keep in mind the type of data we're dealing with here. Streaming data in this context are factual records. Some examples of streaming factual data are the following:
- The temperature at a given location at a given time
- An HTTP log record from a web server
- The price of Bitcoin from a given exchange at a given time
You can imagine the case where...