Processing JSON data in Hive using JSON SerDe
These days, JSON is a very common data structure that's used for data communication and storage. Its key value-based structure gives great flexibility in handling data. In this recipe, we are going to take a look at how to process data stored in the JSON format in Hive. Hive does not have any built-in support to handle JSON, so we will be using JSON SerDe
. SerDe
is a program that consists of a serializer and deserializer, which tell Hive how to read and write data.
Getting ready
To perform this recipe, you should have a running Hadoop cluster with the latest version of Hive installed on it. Here, I am using Hive 1.2.1. Apart from Hive, we also need JSON SerDe
.
There are various JSON SerDe
binaries available from various developers. The most popular, though, can be found at https://github.com/rcongiu/Hive-JSON-Serde.
This project contains code for JSON SerDe
and is compatible with the latest version of Hive. You can either download the code...