Running custom functions
While Spark SQL doesn't support a range of functions as wide as ANSI SQL does, it has an easy and powerful mechanism for registering a normal Scala function and using it inside the SQL context.
Let's say we would like to find out how many profiles fall under each age group. We have a simple function called ageGroup
. Given an age, it returns a string representing the age group:
def fnGroupAge(age: Int, bucket:Int=10) = { val buckets = Array("0-10", "11-20", "20-30", "31-40", "41-50", "51-60", "61-70", "71-80", "81-90", "91-100", ">100") val bucket = buckets((age-1)/10) bucket }
Now, in order to register this function to be used inside Spark SQL, all that we need to do is give it a name and call the register
method of the SQLContext's user-defined function object:
sqlc.udf.register("fnGroupAge", (age:Long)=>ageGroup(age.toInt))
Let's fire our query and see the use of the function in action:
%sql select fnGroupAge(age) as ageGroup...