Hive server modes and setup
In this recipe, we will look at how to setup a Hive server and use it to query the data stored in a distributed system.
Apache Hive is a client-side library that provides a warehouse solution, which enables representation of data on HDFS in a structure format and querying of it using SQL. The table definitions and mapping are stored in a metastore, which is a combination of a service and a database.
The Hive metastore can run in any of three modes: standalone, local metastore, and remote metastore mode. Standalone or embedded mode is not used in production as it limits the number of connections to just one, and everything runs inside a single JVM.
The Hive driver, metastore interface, and database are the three things that make the Hive connection. The default database is Derby, which is used in standalone mode. In production, an external JDBC-compliant database, such as MySQL, is used in place of Derby, as Derby supports only one client connection which would not...