Storing and processing Hive data in the Parquet file format
I'm sure that most of the time, you would have created Hive tables and stored data in a text format; in this recipe, we are going to learn how to store data in PARQUET files.
Getting ready
To perform this recipe, you should have a running Hadoop cluster as well as the latest version of Hive installed on it. Here, I am going to use Hive 1.2.1.
How to do it...
Hive 1.2.1 supports various types of files, which help process data more efficiently. Parquet is an ecosystem-wide accepted file format and can be used in Hive, Map Reduce, Pig, Impala, and so on. To store the data in Parquet files, we first need to create one Hive table, which will store the data in a textual format. We will use the same table that we created in the first recipe.
Creating a table to store Parquet is very easy, as shown here:
create table employee_par( id int, name string) row format delimited fields terminated by '|' stored as PARQUET;
To insert data into the table...