Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds
Arrow up icon
GO TO TOP
Amazon Redshift Cookbook

You're reading from   Amazon Redshift Cookbook Recipes for building modern data warehousing solutions

Arrow left icon
Product type Paperback
Published in Jul 2021
Publisher Packt
ISBN-13 9781800569683
Length 384 pages
Edition 1st Edition
Languages
Arrow right icon
Authors (3):
Arrow left icon
Shruti Worlikar Shruti Worlikar
Author Profile Icon Shruti Worlikar
Shruti Worlikar
Harshida Patel Harshida Patel
Author Profile Icon Harshida Patel
Harshida Patel
Thiyagarajan Arumugam Thiyagarajan Arumugam
Author Profile Icon Thiyagarajan Arumugam
Thiyagarajan Arumugam
Arrow right icon
View More author details
Toc

Table of Contents (13) Chapters Close

Preface 1. Chapter 1: Getting Started with Amazon Redshift 2. Chapter 2: Data Management FREE CHAPTER 3. Chapter 3: Loading and Unloading Data 4. Chapter 4: Data Pipelines 5. Chapter 5: Scalable Data Orchestration for Automation 6. Chapter 6: Data Authorization and Security 7. Chapter 7: Performance Optimization 8. Chapter 8: Cost Optimization 9. Chapter 9: Lake House Architecture 10. Chapter 10: Extending Redshift's Capabilities 11. Other Books You May Enjoy Appendix

Managing UDFs

Scalar UDF functions in Amazon Redshift are routines that are able to take parameters, perform calculations, and return the results. UDFs are handy when performing complex calculations that can be stored and reused in a SQL statement. Amazon Redshift supports UDFs that can be authored using either Python or SQL. In addition, Amazon Redshift also supports AWS Lambda UDFs that open up further possibilities to invoke other AWS services. For example, let's say the latest customer address information is stored in AWS DynamoDB—you can invoke an AWS Lambda UDF to retrieve this using a SQL statement in Amazon Redshift.

Getting ready

To complete this recipe, you will need the following:

  • Access to the AWS console
  • Access to any SQL interface such as a SQL client or query editor
  • Access to create an AWS Lambda function
  • Access to create an Identity and Access Management (IAM) role that can invoke AWS Lambda and attach it to Amazon Redshift

How to do it…

In this recipe, we will start with a scalar Python-based UDF that will be used to parse an XML input:

  1. Connect to Amazon Redshift using the SQL client, and copy and paste the following code to create an f_parse_xml function:
    CREATE OR REPLACE FUNCTION f_parse_xml
    (xml VARCHAR(MAX), input_rank int)
    RETURNS varchar(max)
    STABLE
    AS $$
        import xml.etree.ElementTree as ET
        root = ET.fromstring(xml)
        res = ''
        for country in root.findall('country'):
            rank = country.find('rank').text
            if rank == input_rank:
               res =  name = country.get('name') + ':' + rank
               break
        return res 
    $$ LANGUAGE plpythonu;

    Important note

    The preceding Python-based UDF takes in the XML data and uses the xml.etree.ElementTree library to parse it to locate an element, using the input rank. See https://docs.python.org/3/library/xml.etree.elementtree.html for more options that are available with this XML library.

  2. Now, let's validate the f_parse_xml function using the following statement, by locating the country name that has the rank 2:
    select 
    f_parse_xml('<data>      <country name="Liechtenstein">        <rank>2</rank>         <year>2008</year>         <gdppc>141100</gdppc>         <neighbor name="Austria" direction="E"/>        <neighbor name="Switzerland" direction="W"/> </country></data>', '2') as col1

    This is the expected output:

    col1
    Liechtenstein:2
  3. We will now create another AWS Lambda-based UDF. Navigate to the AWS Management Console and pick the AWS Lambda service and click on Create function, as shown in the following screenshot:
    Figure 2.1 – Creating a Lambda function using the AWS Management Console

    Figure 2.1 – Creating a Lambda function using the AWS Management Console

  4. In the Create function screen, enter rs_lambda under Function name, choose a Python 3.6 runtime, and click on Create function.
  5. Under the Function code textbox, copy and paste the following code and press the Deploy button:
    import json
    def lambda_handler(event, context):
        ret = dict()
        ret['success'] = True
        ret['results'] = ["bar"]
        ret['error_msg'] = "none"
        ret['num_records'] = 1
        return json.dumps(ret)

    In the preceding Python-based Lambda function, a sample result is returned. This function can further be integrated to call any other AWS service—for example, you can invoke AWS Key Management Service (KMS) to encrypt input data.

  6. Navigate to AWS IAM in the AWS Management Console and create a new role, RSInvokeLambda, using the following policy statement by replacing [Your_AWS_Account_Number], [Your_AWS_Region] with your AWS account number/region and attaching the role to the Amazon Redshift cluster:
    {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Effect": "Allow",
                "Action": "lambda:InvokeFunction",
                "Resource": "arn:aws:lambda:[Your_AWS_Region]: [Your_AWS_Account_Number]:function:rs_lambda"
            }
        ]
    }
  7. Connect to Amazon Redshift using the SQL client, and copy and paste the following code to create a f_redshift_lambda function that links the AWS Lambda rs_lambda function:
    CREATE OR REPLACE EXTERNAL FUNCTION f_redshift_lambda (bar varchar)
    RETURNS varchar STABLE
    LAMBDA 'rs_lambda'
    IAM_ROLE 'arn:aws:iam::[Your_AWS_Account_Number]:role/RSInvokeLambda';
  8. You can validate the f_redshift_lambda function by using the following SQL statement:
    select f_redshift_lambda ('input_str') as col1
    --output
    col1
    bar

Amazon Redshift is now able to invoke the AWS Lambda function using a SQL statement.

How it works…

Amazon Redshift allows you to create a scalar UDF using either a SQL SELECT clause or a Python program in addition to the AWS Lambda UDF illustrated in this recipe. The scalar UDFs are stored with Amazon Redshift and are available to any user when granted the required access. You can find a collection of several ready-to-use UDFs that can be used to implement some of the complex reusable logic within a SQL statement at the following link: https://github.com/aws-samples/amazon-redshift-udfs.

You have been reading a chapter from
Amazon Redshift Cookbook
Published in: Jul 2021
Publisher: Packt
ISBN-13: 9781800569683
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image