Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Apache Hive Cookbook
Apache Hive Cookbook

Apache Hive Cookbook:

Arrow left icon
Profile Icon Hanish Bansal Profile Icon Saurabh Chauhan Profile Icon Shrey Mehrotra
Arrow right icon
€18.99 per month
Full star icon Full star icon Full star icon Empty star icon Empty star icon 3 (4 Ratings)
Paperback Apr 2016 268 pages 1st Edition
eBook
€8.99 €29.99
Paperback
€36.99
Subscription
Free Trial
Renews at €18.99p/m
Arrow left icon
Profile Icon Hanish Bansal Profile Icon Saurabh Chauhan Profile Icon Shrey Mehrotra
Arrow right icon
€18.99 per month
Full star icon Full star icon Full star icon Empty star icon Empty star icon 3 (4 Ratings)
Paperback Apr 2016 268 pages 1st Edition
eBook
€8.99 €29.99
Paperback
€36.99
Subscription
Free Trial
Renews at €18.99p/m
eBook
€8.99 €29.99
Paperback
€36.99
Subscription
Free Trial
Renews at €18.99p/m

What do you get with a Packt Subscription?

Free for first 7 days. $19.99 p/m after that. Cancel any time!
Product feature icon Unlimited ad-free access to the largest independent learning library in tech. Access this title and thousands more!
Product feature icon 50+ new titles added per month, including many first-to-market concepts and exclusive early access to books as they are being written.
Product feature icon Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.
Product feature icon Thousands of reference materials covering every tech concept you need to stay up to date.
Subscribe now
View plans & pricing
Table of content icon View table of contents Preview book icon Preview Book

Apache Hive Cookbook

Chapter 1. Developing Hive

In this chapter, we will cover the following recipes:

  • Deploying Hive on a Hadoop cluster
  • Deploying Hive Metastore
  • Installing Hive
  • Configuring HCatalog
  • Understanding different components of Hive
  • Compiling Hive from source
  • Hive packages
  • Debugging Hive
  • Running Hive
  • Changing configurations at runtime

Introduction

Hive, an Apache Hadoop ecosystem component is developed by Facebook to query the data stored in Hadoop Distributed File System (HDFS). Here, HDFS is the data storage layer of Hadoop that at very high level divides the data into small blocks (default 128 MB) and stores these blocks on different nodes.

Hive provides a SQL-like query model named Hive Query Language (HQL) to access and analyze big data. It is also termed Data Warehousing framework of Hadoop and provides various analytical features, such as windowing and partitioning.

Deploying Hive on a Hadoop cluster

Hive is supported by a wide variety of platforms. GNU/Linux and Windows are commonly used as the production environment, whereas Mac OS X is commonly used as the development environment.

Getting ready

In this book, we will assume a GNU/Linux-based installation of Apache Hive for installation and other instructions.

Before installing Hive, the first step is to make sure that a Java SE environment is installed properly. Hive requires version 6 or later, which can be downloaded from http://www.oracle.com/technetwork/java/javase/downloads/index.html.

How to do it...

To install Hive, just download it from http://Hive.apache.org/downloads.html and unpack it. Choose the latest stable version.

Note

At the time of writing this book, Hive 1.2.1 was the latest stable version available.

How it works…

By default, Hive is configured to use an embedded Derby database whose disk storage location is determined by the Hive configuration variable named javax.jdo.option.ConnectionURL. By default, this location is set to the /metastore_dbinconf/hive-default.xml file. Hive with Derby as metastore in embedded mode allows at most one user at a time.

The other modes of installation are Hive with local metastore and Hive with remote metastore, which will be discussed later.

Deploying Hive Metastore

Apache Hive is a client-side library that provides a table-like abstraction on top of the data in HDFS for data processing. Hive jobs are converted into a map reduce plan, which is then submitted to the Hadoop cluster. Hadoop cluster is the set of nodes or machines with HDFS, MapReduce, and YARN deployed on these machines. MapReduce works on the distributed data stored in HDFS and processes a large datasets in parallel, as compared with traditional processing engines that process whole task on a single machine and wait for hours or days for a single query. Yet Another Resource Negotiator (YARN) is used to manage RAM the and CPU cores of the whole cluster, which are critical for running any process on a node.

The Hive table and database definitions and mapping to the data in HDFS is stored in a metastore. A metastore is a central repository for Hive metadata. A metastore consists of two main components, which are really important for working on Hive. Let's take a look at these components:

  • Services to which the client connects and queries the metastore
  • A backing database to store the metadata

Getting ready

In this book, we will assume a GNU/Linux-based installation of Apache Hive for installation and other instructions.

Before installing Hive, the first step is to make sure that a Java SE environment is installed properly. Hive requires version 6 or later, which can be downloaded from http://www.oracle.com/technetwork/java/javase/downloads/index.html.

How to do it…

In Hive, a metastore (service and RDBMS database) could be configured in one of the following ways:

  • An embedded metastore
  • A local metastore
  • A remote metastore

When we install Hive on the preinstalled Hadoop cluster, Hive, by default, gets the embedded database. This means that we need not configure any database as a Hive metastore. Let's check out what these configurations are and why we call them the embedded and remote metastore.

By default, the metastore service and the Hive service run in the same JVM. Hive needs a database to store metadata. In default mode, it uses an embedded Derby database stored on the local file system. The embedded mode of Hive has the limitation that only one session can be opened at a time from the same location on a machine as only one embedded Derby database can get lock and access the database files on disk:

How to do it…

An Embedded Metastore has a single service and a single JVM that cannot work with multiple nodes at a time.

To solve this limitation, a separate RDBMS database runs on same node. The metastore service and Hive service still run in the same JVM. This configuration mode is named local metastore. Here, local means the same environment of the JVM machine as well as the service in the same node.

There is one more configuration where one or more metastore servers run in a separate JVM process to the Hive service connecting to a database on a remote machine. This configuration is named remote metastore.

The Hive service is configured to use a remote metastore by setting hive.metastore.uris to metastore server URIs, separated by commas. The Hive metastore could be configured using properties specified in the following sections.

In the following diagram, the pictorial representation of the metastore and driver is given:

How to do it…
<property>
    <name>hive.metastore.warehouse.dir</name>
    <value>/user/Hive/warehouse </value>
    <description>The directory relative to fs.default.name where managed tables are stored.
    </description>
</property>

<property>
    <name> hive.metastore.uris</name>
    <value></value>
    <description> The URIs specifying the remote metastore servers to connect to. If there are multiple remote servers, clients connect in a round-robin fashion
    </description>
</property>

<property>
    <name>javax.jdo.option. ConnectionURL</name>
    <value>jdbc:derby:;databaseName=hivemetastore;create=true</value>
    <description> The JDBC URL of database.
    </description>
</property>

<property>
    <name> javax.jdo.option.ConnectionDriverName </name>
    <value> org.apache.derby.jdbc.EmbeddedDriver </value>
    <description> The JDBC driver classname.
    </description>
</property>
<property>
    <name>javax.jdo.option.ConnectionUserName</name>
    <value>username</value>
    <description>metastore username to connect with
    </description>
</property>

<property>
    <name> javax.jdo.option.ConnectionPassword </name>
    <value>password</value>
    <description>metastore password to connect with
    </description>
</property>

Installing Hive

We will now take a look at installing Hive along with all the prerequisites.

Getting ready

Let's download the stable version from one of the mirrors:

$ wget http://a.mbbsindia.com/hive/hive-1.2.1/apache-hive-1.2.1-bin.tar.gz

How to do it…

This can be achieved in three ways.

Hive with an embedded metastore

Once you have downloaded the Hive tar-ball file, installing and setting up a Hive is pretty simple and straightforward. Extract the compressed tar:

$tar –xzvf apache-hive-1.2.1-bin.tar.gz

Export the location where Hive is extracted as the environment variable HIVE_HOME:

$ cd  apache-hive-1.2.1-bin
$ export HIVE_HOME={{pwd}}

Hive has all its installation scripts in the $HIVE_HOME/bin directory. Export this location to the PATH environment variable so that you can run all scripts from any location directly from a command-line:

$ export PATH=$HIVE_HOME/bin:$PATH

Alternatively, if you want to set the Hive path permanently for the user, then make the entry of Hive environment variables in the .bashrc or .bash_profile files available or could be created in the user's home folder:

  1. Add the following to ~/.bash_profile:
    export HIVE_HOME=/home/hduser/apache-hive-1.2.1-bin
    export PATH=$PATH:$HIVE_HOME/bin
    
  2. Here, hduser is the name of user with which you have logged in and Hive-1.2.1 is the Hive directory extracted from the tar file.
Run Hive from a terminal:
    hive
    
  3. Make sure that the Hive node has a connection to Hadoop cluster, which means Hive would be installed on any of the Hadoop nodes, or Hadoop configurations are available in the node's class path.
  4. This installation uses the embedded Derby database and stores the data on the local filesystem. Only one Hive session can be open on the node.
  5. If different users try to run the Hive shell, the second would get the Failed to start database 'metastore_db' error.
  6. Run Hive queries for the datastore to test the installation:
    hive> SHOW TABLES;
    hive> CREATE TABLE sales(id INT, product String, age INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';
    
  7. Logs are generated per user bases in the /tmp/<usrename> folder.

Hive with a local metastore

Follow these steps to configure Hive with the local metastore. Here, we are using the MySQL database as a metastore:

  1. Add following to ~/.bash_profile:
    export HIVE_HOME=/home/hduser/apache-hive-1.2.1-bin
    export PATH=$PATH:$HIVE_HOME/bin
    

    Here, hduser is the user name, and apache-hive-1.2.1-bin is the Hive directory extracted from the tar file.

  2. Install a SQL database such as MySQL on the same machine where you want to run Hive.
  3. For the Ubuntu, MySQL could be installed by running the following command on the node's terminal:
    sudo apt-get install mysql-server
    
  4. In case of MySql, Hive needs the mysql-connector jar. Download the latest mysql-connector jar from http://dev.mysql.com/get/Downloads/Connector-J/mysql-connector-java-5.1.35.tar.gz and copy it to the lib folder of your Hive home directory.
  5. Create a file, hive-site.xml, in the conf folder of Hive and add the following entries to it:
    <configuration>
    <property>
    <name>javax.jdo.option.ConnectionURL</name>
    <value>jdbc:mysql://localhost:3306/metastore_db?createDatabaseIfNotExist=true</value>
    <description>metadata is stored in a MySQL server</description>
    </property>
    <property>
    <name>javax.jdo.option.ConnectionDriverName</name>
    <value>com.mysql.jdbc.Driver</value>
    <description>MySQL JDBC driver class</description>
    </property>
    <property>
    <name>javax.jdo.option.ConnectionUserName</name>
    <value>hduser</value>
    <description>user name for connecting to mysql server     
    </description>
    </property>
    <property>
    <name>javax.jdo.option.ConnectionPassword</name>
    <value>passwd</value>
    <description>password for connecting to mysql server</description>
    </property>
    </configuration>
  6. Run Hive from the terminal:
    hive
    

Note

There is a known "JLine" jar conflict issue with Hadoop 2.6.0 and Hive 1.2.1. If you are getting the error "unable to load class jline.terminal," you need to remove the older version of the jline jar from the yarn lib folder using the following command:

sudo rm -r $HADOOP_PREFIX/share/hadoop/yarn/lib/jline-0.9.94.jar

Hive with a remote metastore

Follow these steps to configure Hive with a remote metastore.

  1. Download the latest version of Hive from http://a.mbbsindia.com/hive/hive-1.2.1/apache-hive-1.2.1-bin.tar.gz.
  2. Extract the package:
    tar –xzvf apache-hive-1.2.1-bin.tar.gz
    
  3. Add the following to ~/.bash_profile:
    sudo nano ~/.bash_profile
    export HIVE_HOME=/home/hduser/apache-hive-1.2.1-bin
    export PATH=$PATH:$HIVE_HOME/bin
    

    Here, hduser is the user name and apache-hive-1.2.1-bin is the Hive directory extracted from the tar file.

  4. Install a SQL database such as MySQL on a remote machine to be used for the metastore.
  5. For Ubuntu, MySQL can be installed with the following command:
    sudo apt-get install mysql-server
    
  6. In the case of MySQL, Hive needs the mysql-connector jar file. Download the latest mysql-connector jar from http://dev.mysql.com/get/Downloads/Connector-J/mysql-connector-java-5.1.35.tar.gz and copy it to the lib folder of your Hive home directory.
  7. Add the following entries to hive-site.xml:
    <configuration>
    <property>
    <name>javax.jdo.option.ConnectionURL</name>
    <value>jdbc:mysql://<ip_of_remote_host>:3306/metastore_db?createDatabaseIfNotExist=true</value>
    <description>metadata is stored in a MySQL server</description>
    </property>
    <property>
    <name>javax.jdo.option.ConnectionDriverName</name>
    <value>com.mysql.jdbc.Driver</value><description>MySQL JDBC driver class</description>
    </property>
    <property>
    <name>javax.jdo.option.ConnectionUserName</name>
    <value>hduser</value>
    <description>user name for connecting to mysql server     
    </description>
    </property>
    <property>
    <name>javax.jdo.option.ConnectionPassword</name>
    <value>passwd</value>
    <description>password for connecting to mysql server</description>
    </property>
    </configuration>
  8. Start the Hive metastore interface:
    bin/hive --service metastore &
    
  9. Run Hive from the terminal:
    hive
    
  10. The Hive metastore interface by default listens at port 9083:
    netstat -an | grep 9083
    
  11. Start the Hive shell and make sure that the Hive Data Definition Language and Data Manipulation Language (DDL or DML) operations are working by creating tables in Hive.

Note

There is a known "JLine" jar conflict issue with Hadoop 2.6.0 and Hive 1.2.1. If you are getting the error "unable to load class jline.terminal," you need to remove the older version of jline jar from the yarn lib folder using the following command:

sudo rm -r $HADOOP_PREFIX/share/hadoop/yarn/lib/jline-0.9.94.jar

Configuring HCatalog

Assuming that Hive has been configured in the remote metastore, let's look into how to install and configure HCatalog.

Getting ready

The HCatalog CLI supports these command-line options:

Option

Usage

Description

-g

hcat -g mygrp

The HCatalog table, which needs to be created, must have the group "mygrp".

-p

hcat -p rwxrwxr-x

The HCatalog table, which needs to be created, must have permissions "rwxrwxr-x".

-f

hcat -f myscript.hcat

Tells HCatalog that myscript.hcat is a file containing DDL commands to execute.

-e

hcat -e 'create table mytable(a int);'

Treat the following string as a DDL command and execute it.

-D

hcat -Dkey=value

Pass the key-value pair to HCatalog as a Java System Property.

 

Hcat

Prints a usage message.

How to do it...

Hive 0.11.0 HCatalog is packaged with Hive binaries. Because we have already configured Hive, we could access the HCatalog command-line hcat command on shell. The script is available at the hcatalog/bin directory.

Understanding different components of Hive

Besides the Hive metastore, Hive components could be broadly classified as Hive clients and Hive servers. Hive servers provide interfaces to make the metastore available to external applications and check for user's authorization and authentication, and Hive clients are various applications used to access and execute Hive queries on the Hadoop cluster.

HiveServer

Let's take a look at its various components.

Hive metastore

Hive metastore URIs start a metastore service on the specified port. Metastore provides APIs to query the database, tables, schema, and other entities stored in the RDBMS datastore.

How to do it...

The metastore service starts as a Java process in the backend. You can start the Hive metastore service with the following command:

hive --service metastore &

HiveServer2

HiveServer2 is an interface that allows clients to execute Hive queries and get the result. It is based on Thrift RPC and supports multiple clients a against single client in HiveServer. It also provisioned for the authentication and authorization of the user.

How to do it...

The HiveServer2 service also starts as a Java process in the backend. You can start HiveServer2 with the following command:

hive --service hiveserver2 &

Hive clients

The following are the different clients available in Hive to query metastore data or to submit Hive queries to Hive servers.

Hive CLI

The following are the various sections included in Hive CLI.

Getting ready

Hive Command-line Interface (CLI) can be used to run Hive queries in either interactive or batch mode.

How to do it...

To run Hive CLI, use the following command:

$ HIVE_HOME/bin/hive

Queries are submitted by username of the user logged in to the UNIX system.

Beeline

The following are the various sections included in Beeline.

Getting ready

If you have configured HiveServer2, then a Beeline client can be used to interact with Hive.

How to do it...

To run Beeline, use the following command:

$ HIVE_HOME/bin/beeline

Using beeline, a connection could be made to any HiveServer2 instance with any username and password.

Compiling Hive from source

In this recipe, we will see how to compile Hive from source.

Getting ready

Apache Hive is an open source framework available for compilation and modification by any user. Hive source code is a maven project. The source has intermittent scripts executed on a UNIX platform during compilation.

The following prerequisites need to be installed:

  • UNIX OS: UNIX is preferable for Hive source compilation. Although the source could also be compiled on Windows, you need to comment out the intermittent scripts execution.
  • Maven: The following are the steps to configure maven:
    1. Download the Apache maven binaries for Linux (.tar.gz) from https://maven.apache.org/download.cgi.
      wget http://mirror.olnevhost.net/pub/apache/maven/maven-3/3.3.3/binaries/apache-maven-3.3.3-bin.tar.gz
      
    2. Extract the tar file:
      tar -xzvf apache-maven-3.3.3-bin.tar.gz
      
    3. Create a folder and move maven binaries to that folder:
      sudo mkdir –p /usr/lib/maven
      mv apache-maven-3.3.3-bin/usr/lib/maven/
      
    4. Open /etc/environment:
      sudo nano /etc/profile
      
    5. Add the following variable for the environment PATH:
      export M2_HOME=/usr/lib/maven/apache-maven-3.3.3-bin
      export M2=$M2_HOME/bin
      export PATH=$M2:$PATH
      
    6. Use the command source /etc/environment to add variables to PATH without restart:
      source /etc/environment
      
    7. Check whether maven is properly installed or not:
      mvn –version
      

How to do it...

Follow these steps to compile Hive on a Unix OS:

  1. Download the latest version of the Hive source tar file:
    sudo wget http://a.mbbsindia.com/hive/hive-1.2.1/apache-hive-1.2.1-src.tar.gz
    
  2. Extract the source folder:
    tar –xzvf apache-hive-1.2.1-src.tar.gz
    
  3. Move to the Hive directory:
    cd apache-hive-1.2.1-src
    
  4. To import Hive packages in eclipse, run the following command:
    mvn eclipse:eclipse
    
  5. To compile Hive with Hadoop 2 binaries, run the following command:
    mvn clean install -Phadoop-2,dist
    
  6. In case you want to skip tests execution, run the earlier command with the following switch:
    mvn clean install –DskipTests -Phadoop-2,dist
    
  7. To generate a tarball file from the source code, run the following command:
    mvn clean package -DskipTests -Phadoop-2 -Pdist
    

Hive packages

The following are the various sections included in Hive packages.

Getting ready

Hive source consists of different modules categorized by the features they provide or as a submodule of some other module.

How to do it...

The following is the list of Hive modules and their usage in Hive:

  • accumulo-handler: Apache accumulo is a distributed key-value datastore based on Google Big Table. This package includes the components responsible for mapping the Hive table to the accumulo table. AccumuloStorageHandler and AccumuloPredicateHandler are the main classes responsible for mapping tables. For more information, refer to the official integration documentation available at https://cwiki.apache.org/confluence/display/Hive/AccumuloIntegration.
  • ant: This tool is used to build earlier versions of Hive source. Ant is also needed to configure the Hive Web Interface server.
  • beeline: A Hive client used to connect with HiveServer2 and run Hive queries.
  • bin: This package includes scripts to start Hive clients and services.
  • cli: This is a Hive Command-line Interface implementation.
  • common: These are utility classes used by other modules.
  • conf: This contains default configurations and uses defined configuration objects.
  • contrib: This contains Serdes, generic UDF, and fileformat contributed by third parties to Hive.
  • hbase-handler: This module allows Hive SQL statements to access HBase tables for SELECT and INSERT commands. It also provides interfaces to access HBase and Hive tables for join and union in a single query. More information is available at https://cwiki.apache.org/confluence/display/Hive/HBaseIntegration.
  • hcatalog: This is a table management framework that helps other frameworks such as Pig or MapReduce to access the Hive metastore and table schema.
  • hwi: This module provides an implementation of a web interface to run Hive queries. Also, the WebHCat APIs provide REST APIs to access the Hive metastore.
  • Jdbc: This is a connector that accepts JDBC connections and calls to execute Hive queries on the cluster.
  • Metastore: This is the API that provides access to metastore entities including database, table, schema, and serdes.
  • odbc: This module implements the Open Database Connectivity (ODBC) API, enabling ODBC applications to connect and execute queries over Hive.
  • ql: This module provides an interface to clients that checks for query semantics and provides an implementation for driver, parser, and query planner.
  • Serde: This module has an implementation of serializer and deserializer used by Hive to read and write data. It helps in validating and parsing record and field types.
  • shims: This is the module that transparently intercepts and modifies calls to the Hive API, usually for compatibility purposes.
  • spark-client: This module provides an interface to execute Hive SQLs on a Spark framework.

Debugging Hive

Here, we will take a quick look at the command-line debugging option in Hive.

Getting ready

Hive code could be debugged by assigning a port to Hive and adding socket details to Hive JVM. To add debugging configuration to Hive, execute the following properties on an OS terminal or add it to bash_profile of the user:

export HIVE_DEBUG_PORT=8000
export HIVE_DEBUG="-Xdebug -Xrunjdwp:transport=dt_socket,address=${HIVE_DEBUG_PORT},server=y,suspend=y"

How to do it...

Once a debug port is attached to Hive and Hive server suspension is enabled at startup, the following steps will help you debug Hive queries:

  1. After defining previously mentioned properties, run the Hive CLI in debug mode:
    hive --debug
    
  2. If you have written up your own Test class and want to execute unit test cases written in that class, then you need to execute the following command specifying the class name you want to execute:
    mvn test -Dtest=ClassName
    

Running Hive

Let's see how to run Hive from the command-line.

Getting ready

Once you have the binaries of Hive either compiled or downloaded, you need to configure a metastore for Hive where it keeps information about different entities. Once that is configured, start Hive metastore and HiveServer2 to access the entities from different clients.

How to do it...

Follow these steps to start different components of Hive on a node:

  1. Run Hive CLI:
    $HIVE_HOME/bin/hive
    
  2. Run HiveServer2 and Beeline:
    $HIVE_HOME/bin/hiveserver2
    $HIVE_HOME/bin/beeline -u jdbc:Hive2://$HiveServer2_HOST:$HiveServer2_PORT
    
  3. Run HCatalog and start up the HCatalog server:
    $HIVE_HOME/hcatalog/sbin/hcat_server.sh
    
  4. Run the HCatalog CLI:
    $HIVE_HOME/hcatalog/bin/hcat
    
  5. Run WebHCat:
    $HIVE_HOME/hcatalog/sbin/webhcat_server.sh
    

Changing configurations at runtime

Let's see how we can change various configuration settings at runtime.

How to do it...

Follow these steps to change any of the Hive configuration properties at runtime for a particular session or query:

  1. Configuration for Hive and underlying MapReduce could be changed at runtime through beeline or the CLI. The general syntax to set a property is as follows:
    SET key=value;
    
  2. The configuration set is only applicable for that session. If you want to set it permanently, then you need to set it in Hive-site.xml. The examples are as follows:
    beeline> SET mapred.job.tracker=example.host.com:50030;
    Hive> SET Hive.exec.mode.local.auto=false;
    
Left arrow icon Right arrow icon

Key benefits

  • Grasp a complete reference of different Hive topics.
  • Get to know the latest recipes in development in Hive including CRUD operations
  • Understand Hive internals and integration of Hive with different frameworks used in today’s world.

Description

Hive was developed by Facebook and later open sourced in Apache community. Hive provides SQL like interface to run queries on Big Data frameworks. Hive provides SQL like syntax also called as HiveQL that includes all SQL capabilities like analytical functions which are the need of the hour in today’s Big Data world. This book provides you easy installation steps with different types of metastores supported by Hive. This book has simple and easy to learn recipes for configuring Hive clients and services. You would also learn different Hive optimizations including Partitions and Bucketing. The book also covers the source code explanation of latest Hive version. Hive Query Language is being used by other frameworks including spark. Towards the end you will cover integration of Hive with these frameworks.

Who is this book for?

The book is intended for those who want to start in Hive or who have basic understanding of Hive framework. Prior knowledge of basic SQL command is also required

What you will learn

  • Learn different features and offering on the latest Hive
  • Understand the working and structure of the Hive internals
  • Get an insight on the latest development in Hive framework
  • Grasp the concepts of Hive Data Model
  • Master the key concepts like Partition, Buckets and Statistics
  • Know how to integrate Hive with other frameworks such as Spark, Accumulo, etc

Product Details

Country selected
Publication date, Length, Edition, Language, ISBN-13
Publication date : Apr 29, 2016
Length: 268 pages
Edition : 1st
Language : English
ISBN-13 : 9781782161080
Vendor :
Apache
Category :
Languages :
Tools :

What do you get with a Packt Subscription?

Free for first 7 days. $19.99 p/m after that. Cancel any time!
Product feature icon Unlimited ad-free access to the largest independent learning library in tech. Access this title and thousands more!
Product feature icon 50+ new titles added per month, including many first-to-market concepts and exclusive early access to books as they are being written.
Product feature icon Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.
Product feature icon Thousands of reference materials covering every tech concept you need to stay up to date.
Subscribe now
View plans & pricing

Product Details

Publication date : Apr 29, 2016
Length: 268 pages
Edition : 1st
Language : English
ISBN-13 : 9781782161080
Vendor :
Apache
Category :
Languages :
Tools :

Packt Subscriptions

See our plans and pricing
Modal Close icon
€18.99 billed monthly
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Simple pricing, no contract
€189.99 billed annually
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just €5 each
Feature tick icon Exclusive print discounts
€264.99 billed in 18 months
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just €5 each
Feature tick icon Exclusive print discounts

Frequently bought together


Stars icon
Total 115.97
Apache Hive Cookbook
€36.99
Apache Hive Essentials
€32.99
Hadoop Real-World Solutions Cookbook- Second Edition
€45.99
Total 115.97 Stars icon
Banner background image

Table of Contents

13 Chapters
1. Developing Hive Chevron down icon Chevron up icon
2. Services in Hive Chevron down icon Chevron up icon
3. Understanding the Hive Data Model Chevron down icon Chevron up icon
4. Hive Data Definition Language Chevron down icon Chevron up icon
5. Hive Data Manipulation Language Chevron down icon Chevron up icon
6. Hive Extensibility Features Chevron down icon Chevron up icon
7. Joins and Join Optimization Chevron down icon Chevron up icon
8. Statistics in Hive Chevron down icon Chevron up icon
9. Functions in Hive Chevron down icon Chevron up icon
10. Hive Tuning Chevron down icon Chevron up icon
11. Hive Security Chevron down icon Chevron up icon
12. Hive Integration with Other Frameworks Chevron down icon Chevron up icon
Index Chevron down icon Chevron up icon

Customer reviews

Rating distribution
Full star icon Full star icon Full star icon Empty star icon Empty star icon 3
(4 Ratings)
5 star 0%
4 star 50%
3 star 25%
2 star 0%
1 star 25%
Mukesh Rao Oct 24, 2016
Full star icon Full star icon Full star icon Full star icon Empty star icon 4
Good book to get a wide exposure to Hive...
Amazon Verified review Amazon
shubha Jul 05, 2020
Full star icon Full star icon Full star icon Full star icon Empty star icon 4
Good book for beginners. More details would have been useful.
Amazon Verified review Amazon
Shopper Oct 05, 2016
Full star icon Full star icon Full star icon Empty star icon Empty star icon 3
It's just ok, explanations are not in detail.
Amazon Verified review Amazon
Sap Jan 24, 2020
Full star icon Empty star icon Empty star icon Empty star icon Empty star icon 1
Hive Sucks
Amazon Verified review Amazon
Get free access to Packt library with over 7500+ books and video courses for 7 days!
Start Free Trial

FAQs

What is included in a Packt subscription? Chevron down icon Chevron up icon

A subscription provides you with full access to view all Packt and licnesed content online, this includes exclusive access to Early Access titles. Depending on the tier chosen you can also earn credits and discounts to use for owning content

How can I cancel my subscription? Chevron down icon Chevron up icon

To cancel your subscription with us simply go to the account page - found in the top right of the page or at https://subscription.packtpub.com/my-account/subscription - From here you will see the ‘cancel subscription’ button in the grey box with your subscription information in.

What are credits? Chevron down icon Chevron up icon

Credits can be earned from reading 40 section of any title within the payment cycle - a month starting from the day of subscription payment. You also earn a Credit every month if you subscribe to our annual or 18 month plans. Credits can be used to buy books DRM free, the same way that you would pay for a book. Your credits can be found in the subscription homepage - subscription.packtpub.com - clicking on ‘the my’ library dropdown and selecting ‘credits’.

What happens if an Early Access Course is cancelled? Chevron down icon Chevron up icon

Projects are rarely cancelled, but sometimes it's unavoidable. If an Early Access course is cancelled or excessively delayed, you can exchange your purchase for another course. For further details, please contact us here.

Where can I send feedback about an Early Access title? Chevron down icon Chevron up icon

If you have any feedback about the product you're reading, or Early Access in general, then please fill out a contact form here and we'll make sure the feedback gets to the right team. 

Can I download the code files for Early Access titles? Chevron down icon Chevron up icon

We try to ensure that all books in Early Access have code available to use, download, and fork on GitHub. This helps us be more agile in the development of the book, and helps keep the often changing code base of new versions and new technologies as up to date as possible. Unfortunately, however, there will be rare cases when it is not possible for us to have downloadable code samples available until publication.

When we publish the book, the code files will also be available to download from the Packt website.

How accurate is the publication date? Chevron down icon Chevron up icon

The publication date is as accurate as we can be at any point in the project. Unfortunately, delays can happen. Often those delays are out of our control, such as changes to the technology code base or delays in the tech release. We do our best to give you an accurate estimate of the publication date at any given time, and as more chapters are delivered, the more accurate the delivery date will become.

How will I know when new chapters are ready? Chevron down icon Chevron up icon

We'll let you know every time there has been an update to a course that you've bought in Early Access. You'll get an email to let you know there has been a new chapter, or a change to a previous chapter. The new chapters are automatically added to your account, so you can also check back there any time you're ready and download or read them online.

I am a Packt subscriber, do I get Early Access? Chevron down icon Chevron up icon

Yes, all Early Access content is fully available through your subscription. You will need to have a paid for or active trial subscription in order to access all titles.

How is Early Access delivered? Chevron down icon Chevron up icon

Early Access is currently only available as a PDF or through our online reader. As we make changes or add new chapters, the files in your Packt account will be updated so you can download them again or view them online immediately.

How do I buy Early Access content? Chevron down icon Chevron up icon

Early Access is a way of us getting our content to you quicker, but the method of buying the Early Access course is still the same. Just find the course you want to buy, go through the check-out steps, and you’ll get a confirmation email from us with information and a link to the relevant Early Access courses.

What is Early Access? Chevron down icon Chevron up icon

Keeping up to date with the latest technology is difficult; new versions, new frameworks, new techniques. This feature gives you a head-start to our content, as it's being created. With Early Access you'll receive each chapter as it's written, and get regular updates throughout the product's development, as well as the final course as soon as it's ready.We created Early Access as a means of giving you the information you need, as soon as it's available. As we go through the process of developing a course, 99% of it can be ready but we can't publish until that last 1% falls in to place. Early Access helps to unlock the potential of our content early, to help you start your learning when you need it most. You not only get access to every chapter as it's delivered, edited, and updated, but you'll also get the finalized, DRM-free product to download in any format you want when it's published. As a member of Packt, you'll also be eligible for our exclusive offers, including a free course every day, and discounts on new and popular titles.