The custom ZeroMQ cluster
In this section, we will create a task pipeline, which is a messaging pattern that we mentioned briefly when discussing the Disco project's answer to MapReduce. Task pipelines can be viewed as a generalization of MapReduce in that they support data flows through a sequence of steps, where each step provides the results from its processing to the next step in the flow. We will accomplish this by using ZeroMQ to create a messaging topology that is suitable for the execution of embarrassingly parallel code in a number of worker processes.
Note
The descriptive term embarrassingly parallel was adopted in online parallelization discussions after its use in an article named Matrix Computation on Distributed Memory Multiprocessors that was written by Cleve Moler. The parallelizing of serially biased code is notoriously difficult, and problems that were obviously or easily parallelizable were described by using this phrase.
ZeroMQ is an asynchronous messaging framework, which...