Chapter 4. Data Access Components – Hive and Pig
Hadoop can usually hold terabytes or petabytes of data to process; hence Data Access is an extremely important aspect in any project or product, especially with Hadoop. As we deal with Big Data for processing data, we will have to perform some ad hoc processing to get insights of data and design strategies. Hadoop's basic processing layer is MapReduce, which as we discussed earlier, is a massively parallel processing framework that is scalable, faster, adaptable, and fault tolerant.
We will look at some limitations of MapReduce programming and some programming abstraction layers such as Hive and Pig in detail, which can execute MapReduce using a user friendly language for faster development and management. Hive and Pig are quite useful and handy when it comes to easily do some ad hoc analysis and some not very complex analysis.