Programming model
MapReduce provides an easy way to create parallel programs without the concern for message passing or synchronization. This can help us to perform complex aggregation tasks or searches. As we can observe in the following figure, MapReduce can work with less organized data (such as noise, text, or schemaless documents) than the traditional relational databases. However, the programming model is more procedural which means that the user must have some programming skills such as Java, Python, JavaScript, or C. MapReduce requires two functions, the map
function which is going to create a list of key-value pairs and the reduce
function, which will iterate over each value and then apply a process (merge or summarization) to get an output.
In MapReduce, the data could be split into several nodes (sharding) in that case we will need a partition
function. The partition
function will be in charge of sort and load balancing. In MongoDB we can work over sharded collections automatically...