User-defined functions
Hive defines the following three types of UDF:
UDFs: These are regular user-defined functions that operate row-wise and output one result for one row, such as most built-in mathematic and string functions.
UDAFs: These are user-defined aggregating functions that operate row-wise or group-wise and output one row or one row for each group as a result, such as the
MAX
andCOUNT
built-in functions.UDTFs: These are user-defined table-generating functions that also operate row-wise, but they produce multiple rows/tables as a result, such as the
EXPLODE
function. UDTF can be used either afterSELECT
or after theLATERAL VIEW
statement.Note
Since Hive is implemented in Java, UDFs should be written in Java as well. Since Java supports running code in other languages through the
javax.script
API (see http://docs.oracle.com/javase/6/docs/api/javax/script/package-summary.html), UDFs can be written in languages other than Java. In this book, we only focus on Java UDFs.
We'll start...