A collection of user-defined functions (UDFs) for Amazon Redshift. The intent of this collection is to provide examples for defining python UDFs as well as useful functions which extend Amazon Redshift capabilities and support migrations from legacy DB platforms.
Each function is allocated a folder. At minimal each function will have the the following files which will be used by the deployFunction.sh script and testFunction.sh scripts:
- function.sql - the SQL script to be run in the Redshift DB which creates the UDF. If a Lambda function, use the string
:RedshiftRole
for the IAM role to be passed in by the deployment script. - input.csv - a list of sample input parameters to the function, delimited by comma (,) and where strings are denoted with single-quotes.
- output.csv - a list of expected output values from the function.
Python UDFs may include the following additional file:
- requirements.txt - If your function requires modules not available already in Redshift, a list of modules. The modules will be packaged, uploaded to S3, and mapped to a library in Redshift.
Lambda UDFs may include the following additional file:
-
lambda.yaml - [REQUIRED] a CFN template containing the Lambda function. The lambda function name should match the redshift function name with '_' replaced with '-' e.g. (f-upper-python-varchar). The template may contain additional AWS services required by the lambda function and should contain an IAM Role which can be assumed by the lambda service and which grants access to those additional services (if applicable). In a production deployment, be sure to add resource restrictions to the Lambda IAM Role to ensure the permissions are scoped down.
-
resources.yaml - a CFN template containing external resources which may be referenced by the Lambda function. These resources are for testing only.
-
package.json - (NodeJS Only) If your function requires modules not available already in Lambda, a list of modules. The modules will be packaged, uploaded to S3, and mapped to your Lambda function. See f_mysql_lookup_nodejs for and example.
-
index.js (NodeJS Only) your javascript handler code. See f_upper_nodejs for and example.
-
pom.xml - (Java Only) Lists out dependencies as well as the name of your handler function. See f_upper_java for and example. See Maven Documentation for more details on writing a pom.xml file.
-
src/main/java/
function
/Handler.java - (Java Only) your java handler code. See f_upper_java for and example.
SQL UDFs do not require any additional files.
Located in the bin
directory are tools to deploy and test your UDF functions.
This script will orchestrate the deployment of the UDF to your AWS environment. This includes
- Looping through modules in a
requirements.txt
file (if present) and installing them using thelibraryInstall.sh
script by uploading the packages to the$S3_LOC
and creating the library in Redshift using the$REDSHIFT_ROLE
. - If deploying a nodeJS lambda UDF, using
package.json
to runnpm install
packaging the code and uploading thezip
file to the$S3_LOC
. - If deploying a Java lambda UDF, using
pom.xml
to runmvn package
packaging the code and uploading thejar
to the$S3_LOC
. - If deploying a lambda UDF, using
lambda.yaml
to runaws cloudformation deploy
and build the needed resources. - Creating the UDF function in Redshift by executing the
function.sql
sql script. If deploying a lambda UDF, replacing the:RedshiftRole
parameter.
./deployFunction.sh -t lambda-udfs -f "f_upper_python(varchar)" -c $CLUSTER -d $DB -u $USER -n $SCHEMA -r $REDSHIFT_ROLE
./deployFunction.sh -t python-udfs -f "f_ua_parser_family(varchar)" -c $CLUSTER -d $DB -u $USER -n $SCHEMA -r $REDSHIFT_ROLE -s $S3_LOC
./deployFunction.sh -t sql-udfs -f "f_mask_varchar(varchar,varchar,varchar)" -c $CLUSTER -d $DB -u $USER -n $SCHEMA
This script will test the UDF by
- Creating a temporary table containing the input parameters of the function.
- Loading sample input data of the function using the
input.csv
file. - Running the function leveraging the sample data and comparing the output to the
output.csv
file.
./testFunction.sh -t lambda-udfs -f "f_upper_python(varchar)" -c $CLUSTER -d $DB -u $USER -n $SCHEMA
./testFunction.sh -t python-udfs -f "f_ua_parser_family(varchar)" -c $CLUSTER -d $DB -u $USER -n $SCHEMA
./testFunction.sh -t sql-udfs -f "f_mask_varchar(varchar,varchar,varchar)" -c $CLUSTER -d $DB -u $USER -n $SCHEMA
We would love your contributions. See the contributing page for more details on creating a fork of the project and a pull request of your contribution.
Pull requests will be tested using a Github workflow which leverages the above testing scripts. Please execute these script prior to submitting a pull request to ensure the request is approved quickly. When executed in the test enviornment the RedshiftRole will be defined as follows. You can create a similar role in your local environment for testing.
These privileges ensure the UDF can invoke the Lambda Function as well as access the uploaded *.whl
files located in s3.
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:GetObject",
"lambda:InvokeFunction"
],
"Resource": [
"arn:aws:lambda:*:*:function:f-*",
"arn:aws:s3:::<test bucket>/*"
]
}
]
}