MapReduce and the aggregation framework
MapReduce is a concept: we map data into multiple independent tasks, process the temporary results, and reduce the results in parallel. Basically, we spawn many parallel tasks to mappers. These mappers (which can be threads, processes, or servers, among others) process a specific dataset and spew out results to the reducers. As the reducers keep getting information, they update the final results with this data. This is basically the divide and conquer process.
Nothing explains this better than an example! Suppose we want to show the statistical count of authors by the first letter of their name; it is a good case for using MapReduce. We want to see information as follows:
Authors starting with "a": 1020 Authors starting with "b": 477 Authors starting with "c": 719 Authors starting with "d": 586 Authors starting with "e": 678
First, let's create many authors in our database. For this, we shall use the faker
gem so that we can generate nice...