Samza on Mesos
Samza is an open source distributed stream processing framework originally developed at LinkedIn. It has the following features:
A simple API
State management
Fault tolerance
Durability
Scalability
Pluggability
Processor isolation
Important concepts of Samza
Some concepts in Samza are described in the following sections.
Streams
Samza processes streams of data—for example, website clickstreams, server logs, or any other event data. Messages can be added and read from a data stream. Multiple frameworks can access the same data stream and can partition the data based on the keys present in the message.
Jobs
A Samza job is the computation logic that reads data from input streams, applies some transformations to it, and outputs the resultant messages to a bunch of output streams.
Partitions
Every stream is split into single or multiple partitions. Every partition is an ordered sequence of messages.
Tasks
A job is subdivided into multiple tasks for the parallelism of the computation. Every task...