The reducers library
The count
operation we implemented previously is a sequential algorithm. Each line is processed one at a time until the sequence is exhausted. But there is nothing about the operation that demands that it must be done in this way.
We could split the number of lines into two sequences (ideally of roughly equal length) and reduce over each sequence independently. When we're done, we would just add together the total number of lines from each sequence to get the total number of lines in the file:
If each Reduce ran on its own processing unit, then the two count operations would run in parallel. All the other things being equal, the algorithm would run twice as fast. This is one of the aims of the clojure.core.reducers
library—to bring the benefit of parallelism to algorithms implemented on a single machine by taking advantage of multiple cores.
Parallel folds with reducers
The parallel implementation of reduce implemented by the reducers library is called fold...