Technology and implementation options for scaling-up Machine learning
In this section, we will explore some parallel programming techniques and distributed platform options that Machine learning implementations can adopt. The Hadoop platform will be introduced in the next chapter, and we will look into some practical examples starting from Chapter 3, An Introduction to Hadoop's Architecture and Ecosystem with some real-world examples.
MapReduce programming paradigm
MapReduce is a parallel programming paradigm that abstracts the parallelizing computing and data complexities in a distributed computing environment. It works on the concept of taking the compute function to the data rather than taking the data to the compute function.
MapReduce is more of a programming framework that comes with many built-in functions that the developer need not worry about building, and can alleviate many implementation complexities like data partitioning, scheduling, managing exceptions, and intersystem...