You're reading from Amazon Redshift Cookbook Recipes for building modern data warehousing solutions

Product type Paperback

Published in Jul 2021

Publisher Packt

ISBN-13 9781800569683

Length 384 pages

Edition 1st Edition

Languages

Python

Tools

Amazon Redshift

Concepts

Data Analysis

Authors (3):

Shruti Worlikar

Harshida Patel

Thiyagarajan Arumugam

View More author details

Table of Contents (13) Chapters

Preface

1. Chapter 1: Getting Started with Amazon Redshift

2. Chapter 2: Data Management FREE CHAPTER

3. Chapter 3: Loading and Unloading Data

4. Chapter 4: Data Pipelines

5. Chapter 5: Scalable Data Orchestration for Automation

6. Chapter 6: Data Authorization and Security

7. Chapter 7: Performance Optimization

8. Chapter 8: Cost Optimization

9. Chapter 9: Lake House Architecture

10. Chapter 10: Extending Redshift's Capabilities

11. Other Books You May Enjoy

Appendix

Managing UDFs

Scalar UDF functions in Amazon Redshift are routines that are able to take parameters, perform calculations, and return the results. UDFs are handy when performing complex calculations that can be stored and reused in a SQL statement. Amazon Redshift supports UDFs that can be authored using either Python or SQL. In addition, Amazon Redshift also supports AWS Lambda UDFs that open up further possibilities to invoke other AWS services. For example, let's say the latest customer address information is stored in AWS DynamoDB—you can invoke an AWS Lambda UDF to retrieve this using a SQL statement in Amazon Redshift.

Getting ready

To complete this recipe, you will need the following:

Access to the AWS console
Access to any SQL interface such as a SQL client or query editor
Access to create an AWS Lambda function
Access to create an Identity and Access Management (IAM) role that can invoke AWS Lambda and attach it to Amazon Redshift

How to do it…

In this recipe, we will start with a scalar Python-based UDF that will be used to parse an XML input:

Connect to Amazon Redshift using the SQL client, and copy and paste the following code to create an f_parse_xml function:

CREATE OR REPLACE FUNCTION f_parse_xml
(xml VARCHAR(MAX), input_rank int)
RETURNS varchar(max)
STABLE
AS $$
    import xml.etree.ElementTree as ET
    root = ET.fromstring(xml)
    res = ''
    for country in root.findall('country'):
        rank = country.find('rank').text
        if rank == input_rank:
           res =  name = country.get('name') + ':' + rank
           break
    return res 
$$ LANGUAGE plpythonu;

Important note

The preceding Python-based UDF takes in the XML data and uses the xml.etree.ElementTree library to parse it to locate an element, using the input rank. See https://docs.python.org/3/library/xml.etree.elementtree.html for more options that are available with this XML library.

Now, let's validate the f_parse_xml function using the following statement, by locating the country name that has the rank 2:

select 
f_parse_xml('<data>      <country name="Liechtenstein">        <rank>2</rank>         <year>2008</year>         <gdppc>141100</gdppc>         <neighbor name="Austria" direction="E"/>        <neighbor name="Switzerland" direction="W"/> </country></data>', '2') as col1

This is the expected output:

col1
Liechtenstein:2

We will now create another AWS Lambda-based UDF. Navigate to the AWS Management Console and pick the AWS Lambda service and click on Create function, as shown in the following screenshot:
Figure 2.1 – Creating a Lambda function using the AWS Management Console
In the Create function screen, enter rs_lambda under Function name, choose a Python 3.6 runtime, and click on Create function.
Under the Function code textbox, copy and paste the following code and press the Deploy button:
```
import json
def lambda_handler(event, context):
    ret = dict()
    ret['success'] = True
    ret['results'] = ["bar"]
    ret['error_msg'] = "none"
    ret['num_records'] = 1
    return json.dumps(ret)
```
In the preceding Python-based Lambda function, a sample result is returned. This function can further be integrated to call any other AWS service—for example, you can invoke AWS Key Management Service (KMS) to encrypt input data.

Navigate to AWS IAM in the AWS Management Console and create a new role, RSInvokeLambda, using the following policy statement by replacing [Your_AWS_Account_Number], [Your_AWS_Region] with your AWS account number/region and attaching the role to the Amazon Redshift cluster:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "lambda:InvokeFunction",
            "Resource": "arn:aws:lambda:[Your_AWS_Region]: [Your_AWS_Account_Number]:function:rs_lambda"
        }
    ]
}

Connect to Amazon Redshift using the SQL client, and copy and paste the following code to create a f_redshift_lambda function that links the AWS Lambda rs_lambda function:

CREATE OR REPLACE EXTERNAL FUNCTION f_redshift_lambda (bar varchar)
RETURNS varchar STABLE
LAMBDA 'rs_lambda'
IAM_ROLE 'arn:aws:iam::[Your_AWS_Account_Number]:role/RSInvokeLambda';

You can validate the f_redshift_lambda function by using the following SQL statement:
```
select f_redshift_lambda ('input_str') as col1
--output
col1
bar
```

Amazon Redshift is now able to invoke the AWS Lambda function using a SQL statement.

How it works…

Amazon Redshift allows you to create a scalar UDF using either a SQL SELECT clause or a Python program in addition to the AWS Lambda UDF illustrated in this recipe. The scalar UDFs are stored with Amazon Redshift and are available to any user when granted the required access. You can find a collection of several ready-to-use UDFs that can be used to implement some of the complex reusable logic within a SQL statement at the following link: https://github.com/aws-samples/amazon-redshift-udfs.

You're reading from Amazon Redshift Cookbook Recipes for building modern data warehousing solutions

Table of Contents (13) Chapters

Managing UDFs

Getting ready

How to do it…

How it works…

Authors (3)

Other recommended products

Personalised recommendations for you