Configuring Kerberos for Hadoop ecosystem components
The Hadoop ecosystem is growing continuously and maturing with increasing enterprise adoption. In this section, we look at some of the most important Hadoop ecosystem components, their architecture, and how they can be secured.
Securing Hive
Hive provides the ability to run SQL queries over the data stored in the HDFS. Hive provides the Hive query engine that converts Hive queries provided by the user to a pipeline of MapReduce jobs that are submitted to Hadoop (JobTracker or ResourceManager) for execution. The results of the MapReduce executions are then presented back to the user or stored in HDFS. The following figure shows a high-level interaction of a business user working with Hive to run Hive queries on Hadoop:
There are multiple ways a Hadoop user can interact with Hive and run Hive queries; these are as follows:
The user can directly run the Hive queries using Command Line Interface (CLI). The CLI connects to the Hive metastore using...