Writing a user-defined function in Hive
In the previous chapter, we talked about how to write user-defined functions in Pig; in this recipe, we are going to do the same for Hive. Hive supports the adding of temporary functions, which can be used to process data. We will be writing UDF in Java and will also create functions that can be used in data processing.
Getting ready
To perform this recipe, you should have a running Hadoop cluster as well as the latest version of Hive installed on it. Here, I am using Hive 1.2.1. We will also need the Eclipse IDE for development.
How to do it
There are various system functions that are supported by Hive, but sometimes, you will need to do something different that cannot be handled by system provided functions. To do this, we will need to write a custom function.
Take a situation where we have census data and a person's income, and we want to categorize them into three parts based on the person's income. The following is some sample data where we have the...