Managing UDFs
Scalar UDF functions in Amazon Redshift are routines that are able to take parameters, perform calculations, and return the results. UDFs are handy when performing complex calculations that can be stored and reused in a SQL statement. Amazon Redshift supports UDFs that can be authored using either Python or SQL. In addition, Amazon Redshift also supports AWS Lambda UDFs that open up further possibilities to invoke other AWS services. For example, let's say the latest customer address information is stored in AWS DynamoDB—you can invoke an AWS Lambda UDF to retrieve this using a SQL statement in Amazon Redshift.
Getting ready
To complete this recipe, you will need the following:
- Access to the AWS console
- Access to any SQL interface such as a SQL client or query editor
- Access to create an AWS Lambda function
- Access to create an Identity and Access Management (IAM) role that can invoke AWS Lambda and attach it to Amazon Redshift
How to do it…
In this recipe, we will start with a scalar Python-based UDF that will be used to parse an XML input:
- Connect to Amazon Redshift using the SQL client, and copy and paste the following code to create an
f_parse_xml
function:CREATE OR REPLACE FUNCTION f_parse_xml (xml VARCHAR(MAX), input_rank int) RETURNS varchar(max) STABLE AS $$ import xml.etree.ElementTree as ET root = ET.fromstring(xml) res = '' for country in root.findall('country'): rank = country.find('rank').text if rank == input_rank: res = name = country.get('name') + ':' + rank break return res $$ LANGUAGE plpythonu;
Important note
The preceding Python-based UDF takes in the XML data and uses the
xml.etree.ElementTree
library to parse it to locate an element, using the input rank. See https://docs.python.org/3/library/xml.etree.elementtree.html for more options that are available with this XML library. - Now, let's validate the
f_parse_xml
function using the following statement, by locating the country name that has the rank 2:select f_parse_xml('<data> <country name="Liechtenstein"> <rank>2</rank> <year>2008</year> <gdppc>141100</gdppc> <neighbor name="Austria" direction="E"/> <neighbor name="Switzerland" direction="W"/> </country></data>', '2') as col1
This is the expected output:
col1 Liechtenstein:2
- We will now create another AWS Lambda-based UDF. Navigate to the AWS Management Console and pick the AWS Lambda service and click on Create function, as shown in the following screenshot:
- In the Create function screen, enter
rs_lambda
under Function name, choose a Python 3.6 runtime, and click on Create function. - Under the Function code textbox, copy and paste the following code and press the Deploy button:
import json def lambda_handler(event, context): ret = dict() ret['success'] = True ret['results'] = ["bar"] ret['error_msg'] = "none" ret['num_records'] = 1 return json.dumps(ret)
In the preceding Python-based Lambda function, a sample result is returned. This function can further be integrated to call any other AWS service—for example, you can invoke AWS Key Management Service (KMS) to encrypt input data.
- Navigate to AWS IAM in the AWS Management Console and create a new role,
RSInvokeLambda
, using the following policy statement by replacing[Your_AWS_Account_Number], [Your_AWS_Region]
with your AWS account number/region and attaching the role to the Amazon Redshift cluster:{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": "lambda:InvokeFunction", "Resource": "arn:aws:lambda:[Your_AWS_Region]: [Your_AWS_Account_Number]:function:rs_lambda" } ] }
- Connect to Amazon Redshift using the SQL client, and copy and paste the following code to create a
f_redshift_lambda
function that links the AWS Lambdars_lambda
function:CREATE OR REPLACE EXTERNAL FUNCTION f_redshift_lambda (bar varchar) RETURNS varchar STABLE LAMBDA 'rs_lambda' IAM_ROLE 'arn:aws:iam::[Your_AWS_Account_Number]:role/RSInvokeLambda';
- You can validate the
f_redshift_lambda
function by using the following SQL statement:select f_redshift_lambda ('input_str') as col1 --output col1 bar
Amazon Redshift is now able to invoke the AWS Lambda function using a SQL statement.
How it works…
Amazon Redshift allows you to create a scalar UDF using either a SQL SELECT
clause or a Python program in addition to the AWS Lambda UDF illustrated in this recipe. The scalar UDFs are stored with Amazon Redshift and are available to any user when granted the required access. You can find a collection of several ready-to-use UDFs that can be used to implement some of the complex reusable logic within a SQL statement at the following link: https://github.com/aws-samples/amazon-redshift-udfs.