Hadoop
Apache Hadoop is a set of software components created for the parallel storage and computation of large volumes of data. The main idea at the time of its inception was to use commonly available computers in a distributed fashion, with high resiliency against failure and distributed computation. With its success, more high-end computers started to be used on Hadoop clusters, although commodity hardware is still a common use case.
By parallel storage, we mean any system that stores and retrieves stored data in a parallel fashion, using several nodes interconnected by a network.
Hadoop is composed of the following:
Hadoop Common: the basic common Hadoop items
Hadoop YARN: a resource and job manager
Hadoop MapReduce: a large-scale parallel processing engine
Hadoop Distributed File System (HDFS): as the name suggests, HDFS is a file system that can be distributed over several machines, using local disks, to create a large storage pool:
Another important component...