Impala is a modern, open source massive parallel processing (MPP) SQL engine designed to work with a Hadoop environment. It provides the ability to execute queries with low latency. Hive does not meet the expectation for use cases requiring interactive analytics in a multi-user environment. Impala is integrated into the Hadoop environment and uses a number of standard Hadoop components such as Metastore, HDFS, HBase, YARN, and Sentry. Unlike hive, it does not run MapReduce jobs to get results. Hive uses the MapReduce engine for execution and the intermediate output results are stored on disk, which acts as an input to another job.Â
Impala
Impala architecture
Impala is a massive parallel processing (MPP...