Concurrent pipelines
Imagine a situation where we have to carry out jobs at a certain throughput, such that each job includes the same sequence of differently sized I/O task (task A), a memory-bound task (task B) and, again, an I/O task (task C). A naïve approach would be to create a thread pool and run each job off it, but soon we realize that this is not optimum because we cannot ascertain the utilization of each I/O resource due to unpredictability of the threads being scheduled by the OS. We also observe that even though several concurrent jobs have similar I/O tasks, we are unable to batch them in our first approach.
As the next iteration, we split each job in stages (A, B, C), such that each stage corresponds to one task. Since the tasks are well known, we create one thread pool (of appropriate size) per stage and execute tasks in them. The result of task A is required by task B, and B's result is required by task C—we enable this communication via queues. Now, we can tune the thread...