Apache Samza
Samza is an open source project from LinkedIn and is currently an incubation project at the Apache Software Foundation. Samza is a lightweight distributed stream-processing framework to do real-time processing of data. The version that is available for download from the Apache website is not the production version that LinkedIn uses.
Samza is made up of the following three layers:
A streaming layer
An execution layer
A processing layer
Samza provides out-of-the-box support for all the preceding three layers:
Streaming: This layer is supported by Kafka (another open source project from LinkedIn)
Execution: supported by YARN
Processing: supported by Samza API
The following three pieces fit together to form Samza:
The following architecture should be familiar to anyone who has used Hadoop:
Before going into each of these three layers indepth, it should be noted that Samza's support is not limited to these systems. Both Samza's execution and streaming layers are pluggable and allow developers...