Search icon CANCEL
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Apache Hive Essentials

You're reading from   Apache Hive Essentials Immerse yourself on a fantastic journey to discover the attributes of big data by using Hive

Arrow left icon
Product type Paperback
Published in Feb 2015
Publisher Packt
ISBN-13 9781783558575
Length 208 pages
Edition 1st Edition
Languages
Tools
Arrow right icon
Author (1):
Arrow left icon
Dayong Du Dayong Du
Author Profile Icon Dayong Du
Dayong Du
Arrow right icon
View More author details
Toc

Table of Contents (12) Chapters Close

Preface 1. Overview of Big Data and Hive 2. Setting Up the Hive Environment FREE CHAPTER 3. Data Definition and Description 4. Data Selection and Scope 5. Data Manipulation 6. Data Aggregation and Sampling 7. Performance Considerations 8. Extensibility Considerations 9. Security Considerations 10. Working with Other Tools Index

Hive overview

Hive is a standard for SQL queries over petabytes of data in Hadoop. It provides SQL-like access for data in HDFS making Hadoop to be used like a warehouse structure. The Hive Query Language (HQL) has similar semantics and functions as standard SQL in the relational database so that experienced database analysts can easily get their hands on it. Hive's query language can run on different computing frameworks, such as MapReduce, Tez, and Spark for better performance.

Hive's data model provides a high-level, table-like structure on top of HDFS. It supports three data structures: tables, partitions, and buckets, where tables correspond to HDFS directories and can be divided into partitions, which in turn can be divided into buckets. Hive supports a majority of primitive data formats such as TIMESTAMP, STRING, FLOAT, BOOLEAN, DECIMAL, DOUBLE, INT, SMALLINT, BIGINT, and complex data types, such as UNION, STRUCT, MAP, and ARRAY.

The following diagram is the architecture seen inside the view of Hive in the Hadoop ecosystem. The Hive metadata store (or called metastore) can use either embedded, local, or remote databases. Hive servers are built on Apache Thrift Server technology. Since Hive has released 0.11, Hive Server 2 is available to handle multiple concurrent clients, which support Kerberos, LDAP, and custom pluggable authentication, providing better options for JDBC and ODBC clients, especially for metadata access.

Hive overview

Hive architecture

Here are some highlights of Hive that we can keep in mind moving forward:

  • Hive provides a simpler query model with less coding than MapReduce
  • HQL and SQL have similar syntax
  • Hive provides lots of functions that lead to easier analytics usage
  • The response time is typically much faster than other types of queries on the same type of huge datasets
  • Hive supports running on different computing frameworks
  • Hive supports ad hoc querying data on HDFS
  • Hive supports user-defined functions, scripts, and a customized I/O format to extend its functionality
  • Hive is scalable and extensible to various types of data and bigger datasets
  • Matured JDBC and ODBC drivers allow many applications to pull Hive data for seamless reporting
  • Hive allows users to read data in arbitrary formats, using SerDes and Input/Output formats
  • Hive has a well-defined architecture for metadata management, authentication, and query optimizations
  • There is a big community of practitioners and developers working on and using Hive
lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime