Impala
Impala is a new member of the Hadoop ecosystem. Its beta version first became available in 2012 and the first stable release was done in June 2013. Even though Impala is a new project and still has lots of things that need to be improved, the significance of the goal that it is trying to achieve makes it worth mentioning in this book. Impala's goal is very ambitious—bringing real-time queries to Hadoop. Hive made it possible to use a SQL-like language to query data in Hadoop, but it was still limited by the MapReduce framework when it comes to performance. It is worth mentioning that projects like Stinger (http://hortonworks.com/labs/stinger/) are dedicated to significantly improve Hive performance, but it is still in development.
Impala bypasses MapReduce and operates on the data directly in HDFS to achieve significant performance improvements. Impala is written mostly in C++. It uses RAM buffers to cache data and generally operates more like parallel relational databases. Impala...