Hadoop for big data
Apache Hadoop is a 100 percent open source software framework used for two important fundamental tasks: storing and processing big data. It has been the leading big data tool for distributed parallel processing of data stored across multiple servers and is able to scale without limits. Because of its scalability, flexibility, fault tolerance, and low-cost features, many cloud-based solution vendors, financial institutions, and enterprises use Hadoop for their big data needs.
The Hadoop framework contains modules that are critical to its functions: the Hadoop Distributed File System (HDFS), Yet Another Resource Negotiator (YARN), and MapReduce (MapR).
HDFS
HDFS is a file system unique to Hadoop that is designed to be scalable and portable, and allows large amounts of file storage over multiple nodes in a Hadoop cluster spanning gigabytes or terabytes of data. Data in a cluster is split into smaller blocks of 128 Megabytes typically and distributed throughout the cluster....