Chapter 12. Analytics Using Hadoop
Hadoop has come to the fore because of its capability to aid in data analytics. As data grows in the dimensions of volume, velocity, and variety, there needs to be systems that are capable of analyzing this data efficiently and effectively. Vertically scaling hardware to handle this data is not viable because it is expensive and difficult to manage. Distributed computing and horizontal scaling are good options, and frameworks such as Hadoop automatically cater to the fault tolerance, scaling, and distribution needs of such a system.
Analytics is all about data. A question that frequently arises is when does Hadoop become overkill? Typically, it is recommended that you use Hadoop for datasets of 1 TB and upwards. However, when it becomes difficult to predict the rate of data growth, it may be a good idea to use Hadoop MapReduce because of its attractive "code once, deploy at any scale" characteristic.
There are organizations that use Hadoop...