UDF, UDAF, and UDTF
Like in Pig, UDFs are one of the most important extensibility features in Hive. Writing a UDF in Hive is simpler, but the interfaces do not define every override method that is needed to make the UDF complete. This is because UDFs can take any number of parameters, and it is difficult to provide a fixed interface. Hive uses Java reflection under the hood when executing the UDF to figure out the parameter list for the function.
These are the following three kinds of UDFs in Hive:
Regular UDFs: These UDFs take in a single row and produce a single row after application of the custom logic.
UDAFs: These are aggregators that take in multiple rows but output a single row.
SUM
andCOUNT
are examples of in-built UDAFs.UDTFs: These are generator functions that take in a single row and produce multiple rows as outputs. The
EXPLODE
function is a UDTF.
The following code example shows how a simple UDF is written. Every UDF is extended from the UDF class present in org.apache.hadoop...