About Hadoop
Let's start out with an explanation of Hadoop that is generally circulated.
As per Apache Hadoop wikipedia.org, 2016:
"Hadoop is an open-source software "framework" for distributed storage and distributed processing (of very large datasets) on computer clusters built from commodity hardware."
The following is a visualization that may help understand the master-to-slave architecture used by Hadoop:
Hadoop uses an architecture called MapReduce. This is a design that designates a processor (in a cluster of processors) as the master, which controls distributing or mapping tasks to other slave processors to process your data, thus reducing the processing performed by the cluster of processors to a single output result. So, you can now see that the name mapped reduction or MapReduce (of processing tasks) makes sense.
Hadoop is able to take your data and split it up (or distribute it) over a number of computers that have space or resources available.
These...