Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds
Arrow up icon
GO TO TOP
Apache Hive Cookbook

You're reading from   Apache Hive Cookbook

Arrow left icon
Product type Paperback
Published in Apr 2016
Publisher Packt
ISBN-13 9781782161080
Length 268 pages
Edition 1st Edition
Languages
Tools
Arrow right icon
Toc

Table of Contents (14) Chapters Close

Preface 1. Developing Hive 2. Services in Hive FREE CHAPTER 3. Understanding the Hive Data Model 4. Hive Data Definition Language 5. Hive Data Manipulation Language 6. Hive Extensibility Features 7. Joins and Join Optimization 8. Statistics in Hive 9. Functions in Hive 10. Hive Tuning 11. Hive Security 12. Hive Integration with Other Frameworks Index

Installing Hive

We will now take a look at installing Hive along with all the prerequisites.

Getting ready

Let's download the stable version from one of the mirrors:

$ wget http://a.mbbsindia.com/hive/hive-1.2.1/apache-hive-1.2.1-bin.tar.gz

How to do it…

This can be achieved in three ways.

Hive with an embedded metastore

Once you have downloaded the Hive tar-ball file, installing and setting up a Hive is pretty simple and straightforward. Extract the compressed tar:

$tar –xzvf apache-hive-1.2.1-bin.tar.gz

Export the location where Hive is extracted as the environment variable HIVE_HOME:

$ cd  apache-hive-1.2.1-bin
$ export HIVE_HOME={{pwd}}

Hive has all its installation scripts in the $HIVE_HOME/bin directory. Export this location to the PATH environment variable so that you can run all scripts from any location directly from a command-line:

$ export PATH=$HIVE_HOME/bin:$PATH

Alternatively, if you want to set the Hive path permanently for the user, then make the entry of Hive environment variables in the .bashrc or .bash_profile files available or could be created in the user's home folder:

  1. Add the following to ~/.bash_profile:
    export HIVE_HOME=/home/hduser/apache-hive-1.2.1-bin
    export PATH=$PATH:$HIVE_HOME/bin
    
  2. Here, hduser is the name of user with which you have logged in and Hive-1.2.1 is the Hive directory extracted from the tar file.
Run Hive from a terminal:
    hive
    
  3. Make sure that the Hive node has a connection to Hadoop cluster, which means Hive would be installed on any of the Hadoop nodes, or Hadoop configurations are available in the node's class path.
  4. This installation uses the embedded Derby database and stores the data on the local filesystem. Only one Hive session can be open on the node.
  5. If different users try to run the Hive shell, the second would get the Failed to start database 'metastore_db' error.
  6. Run Hive queries for the datastore to test the installation:
    hive> SHOW TABLES;
    hive> CREATE TABLE sales(id INT, product String, age INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';
    
  7. Logs are generated per user bases in the /tmp/<usrename> folder.

Hive with a local metastore

Follow these steps to configure Hive with the local metastore. Here, we are using the MySQL database as a metastore:

  1. Add following to ~/.bash_profile:
    export HIVE_HOME=/home/hduser/apache-hive-1.2.1-bin
    export PATH=$PATH:$HIVE_HOME/bin
    

    Here, hduser is the user name, and apache-hive-1.2.1-bin is the Hive directory extracted from the tar file.

  2. Install a SQL database such as MySQL on the same machine where you want to run Hive.
  3. For the Ubuntu, MySQL could be installed by running the following command on the node's terminal:
    sudo apt-get install mysql-server
    
  4. In case of MySql, Hive needs the mysql-connector jar. Download the latest mysql-connector jar from http://dev.mysql.com/get/Downloads/Connector-J/mysql-connector-java-5.1.35.tar.gz and copy it to the lib folder of your Hive home directory.
  5. Create a file, hive-site.xml, in the conf folder of Hive and add the following entries to it:
    <configuration>
    <property>
    <name>javax.jdo.option.ConnectionURL</name>
    <value>jdbc:mysql://localhost:3306/metastore_db?createDatabaseIfNotExist=true</value>
    <description>metadata is stored in a MySQL server</description>
    </property>
    <property>
    <name>javax.jdo.option.ConnectionDriverName</name>
    <value>com.mysql.jdbc.Driver</value>
    <description>MySQL JDBC driver class</description>
    </property>
    <property>
    <name>javax.jdo.option.ConnectionUserName</name>
    <value>hduser</value>
    <description>user name for connecting to mysql server     
    </description>
    </property>
    <property>
    <name>javax.jdo.option.ConnectionPassword</name>
    <value>passwd</value>
    <description>password for connecting to mysql server</description>
    </property>
    </configuration>
  6. Run Hive from the terminal:
    hive
    

Note

There is a known "JLine" jar conflict issue with Hadoop 2.6.0 and Hive 1.2.1. If you are getting the error "unable to load class jline.terminal," you need to remove the older version of the jline jar from the yarn lib folder using the following command:

sudo rm -r $HADOOP_PREFIX/share/hadoop/yarn/lib/jline-0.9.94.jar

Hive with a remote metastore

Follow these steps to configure Hive with a remote metastore.

  1. Download the latest version of Hive from http://a.mbbsindia.com/hive/hive-1.2.1/apache-hive-1.2.1-bin.tar.gz.
  2. Extract the package:
    tar –xzvf apache-hive-1.2.1-bin.tar.gz
    
  3. Add the following to ~/.bash_profile:
    sudo nano ~/.bash_profile
    export HIVE_HOME=/home/hduser/apache-hive-1.2.1-bin
    export PATH=$PATH:$HIVE_HOME/bin
    

    Here, hduser is the user name and apache-hive-1.2.1-bin is the Hive directory extracted from the tar file.

  4. Install a SQL database such as MySQL on a remote machine to be used for the metastore.
  5. For Ubuntu, MySQL can be installed with the following command:
    sudo apt-get install mysql-server
    
  6. In the case of MySQL, Hive needs the mysql-connector jar file. Download the latest mysql-connector jar from http://dev.mysql.com/get/Downloads/Connector-J/mysql-connector-java-5.1.35.tar.gz and copy it to the lib folder of your Hive home directory.
  7. Add the following entries to hive-site.xml:
    <configuration>
    <property>
    <name>javax.jdo.option.ConnectionURL</name>
    <value>jdbc:mysql://<ip_of_remote_host>:3306/metastore_db?createDatabaseIfNotExist=true</value>
    <description>metadata is stored in a MySQL server</description>
    </property>
    <property>
    <name>javax.jdo.option.ConnectionDriverName</name>
    <value>com.mysql.jdbc.Driver</value><description>MySQL JDBC driver class</description>
    </property>
    <property>
    <name>javax.jdo.option.ConnectionUserName</name>
    <value>hduser</value>
    <description>user name for connecting to mysql server     
    </description>
    </property>
    <property>
    <name>javax.jdo.option.ConnectionPassword</name>
    <value>passwd</value>
    <description>password for connecting to mysql server</description>
    </property>
    </configuration>
  8. Start the Hive metastore interface:
    bin/hive --service metastore &
    
  9. Run Hive from the terminal:
    hive
    
  10. The Hive metastore interface by default listens at port 9083:
    netstat -an | grep 9083
    
  11. Start the Hive shell and make sure that the Hive Data Definition Language and Data Manipulation Language (DDL or DML) operations are working by creating tables in Hive.

Note

There is a known "JLine" jar conflict issue with Hadoop 2.6.0 and Hive 1.2.1. If you are getting the error "unable to load class jline.terminal," you need to remove the older version of jline jar from the yarn lib folder using the following command:

sudo rm -r $HADOOP_PREFIX/share/hadoop/yarn/lib/jline-0.9.94.jar
lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image