redshift_connector

redshift_connector is the Amazon Redshift connector for Python. Easy integration with pandas and numpy, as well as support for numerous Amazon Redshift specific features help you get the most out of your data

Supported Amazon Redshift features include:

IAM authentication
Identity provider (IdP) authentication
Redshift specific data types

This pure Python connector implements Python Database API Specification 2.0.

Getting Started

Install from Binary

Package Manager	Downloads	Installation Command
PyPi		`pip install redshift_connector`
Conda		`conda install -c conda-forge redshift_connector`

Install from Source

You may install from source by cloning this repository.

$ git clone https://github.com/aws/amazon-redshift-python-driver.git
$ cd redshift_connector
$ pip install .

Basic Example

import redshift_connector

# Connects to Redshift cluster using AWS credentials
conn = redshift_connector.connect(
    host='examplecluster.abc123xyz789.us-west-1.redshift.amazonaws.com',
    database='dev',
    user='awsuser',
    password='my_password'
 )

cursor: redshift_connector.Cursor = conn.cursor()
cursor.execute("create Temp table book(bookname varchar,author varchar)")
cursor.executemany("insert into book (bookname, author) values (%s, %s)",
                    [
                        ('One Hundred Years of Solitude', 'Gabriel García Márquez'),
                        ('A Brief History of Time', 'Stephen Hawking')
                    ]
                  )
cursor.execute("select * from book")

result: tuple = cursor.fetchall()
print(result)
>> (['One Hundred Years of Solitude', 'Gabriel García Márquez'], ['A Brief History of Time', 'Stephen Hawking'])

Enabling autocommit

Following the DB-API specification, autocommit is off by default. It can be turned on by using the autocommit property of the connection.

# Make sure we're not in a transaction
con.rollback()
con.autocommit = True
con.run("VACUUM")
con.autocommit = False

Configuring cursor paramstyle

The paramstyle for a cursor can be modified via cursor.paramstyle. The default paramstyle used is format. Valid values for paramstyle include qmark, numeric, named, format, pyformat.

# qmark
redshift_connector.paramstyle = 'qmark'
sql = 'insert into foo(bar, jar) VALUES(?, ?)'
cursor.execute(sql, (1, "hello world"))

# numeric
redshift_connector.paramstyle = 'numeric'
sql = 'insert into foo(bar, jar) VALUES(:1, :2)'
cursor.execute(sql, (1, "hello world"))

# named
redshift_connector.paramstyle = 'numeric'
sql = 'insert into foo(bar, jar) VALUES(:p1, :p2)'
cursor.execute(sql, p1=1, p2="hello world")

# format
redshift_connector.paramstyle = 'format'
sql = 'insert into foo(bar, jar) VALUES(%s, %s)'
cursor.execute(sql, (1, "hello world"))

# pyformat
redshift_connector.paramstyle = 'pyformat'
sql = 'insert into foo(bar, jar) VALUES(%(bar)s, %(jar)s)'
cursor.execute(sql, {"bar": 1, "jar": "hello world"})

Example using IAM Credentials

IAM Credentials can be supplied directly to connect(...) using an AWS profile as shown below:

import redshift_connector

# Connects to Redshift cluster using IAM credentials from default profile defined in ~/.aws/credentials
conn = redshift_connector.connect(
    iam=True,
    database='dev',
    db_user='awsuser',
    password='',
    user='',
    cluster_identifier='examplecluster',
    profile='default'
 )

# ~/.aws/credentials
[default]
aws_access_key_id="my_aws_access_key_id"
aws_secret_access_key="my_aws_secret_access_key"
aws_session_token="my_aws_session_token"

# ~/.aws/config
[default]
region=us-west-2

If a region is not provided in ~/.aws/config or you would like to override its value, region may be passed to connect(...).

Alternatively, IAM credentials can be supplied directly to connect(...) using AWS credentials as shown below:

import redshift_connector

# Connects to Redshift cluster using IAM credentials from default profile defined in ~/.aws/credentials
conn = redshift_connector.connect(
    iam=True,
    database='dev',
    db_user='awsuser',
    password='',
    user='',
    cluster_identifier='examplecluster',
    access_key_id="my_aws_access_key_id",
    secret_access_key="my_aws_secret_access_key",
    session_token="my_aws_session_token",
    region="us-east-2"
 )

Integration with pandas

import pandas
cursor.execute("create Temp table book(bookname varchar,author varchar)")
cursor.executemany("insert into book (bookname, author) values (%s, %s)",
                   [
                       ('One Hundred Years of Solitude', 'Gabriel García Márquez'),
                       ('A Brief History of Time', 'Stephen Hawking')

                   ])
cursor.execute("select * from book")
result: pandas.DataFrame = cursor.fetch_dataframe()
print(result)
>>                         bookname                 author
>> 0  One Hundred Years of Solitude  Gabriel García Márquez
>> 1        A Brief History of Time         Stephen Hawking

Integration with numpy

import numpy
cursor.execute("select * from book")

result: numpy.ndarray = cursor.fetch_numpy_array()
print(result)
>> [['One Hundred Years of Solitude' 'Gabriel García Márquez']
>>  ['A Brief History of Time' 'Stephen Hawking']]

Query using functions

cursor.execute("SELECT CURRENT_TIMESTAMP")
print(cursor.fetchone())
>> [datetime.datetime(2020, 10, 26, 23, 3, 54, 756497, tzinfo=datetime.timezone.utc)]

Connection Parameters

Name	Type	Description	Default Value	Required
database	str	The name of the database to connect to		Yes
host	str	The hostname of Amazon Redshift cluster		Yes
password	str	The password to use for authentication		Yes
user	str	The username to use for authentication		Yes
access_key_id	str	The access key for the IAM role or IAM user configured for IAM database authentication	None	No
allow_db_user_override	bool	True specifies the driver uses the DbUser value from the SAML assertion while False indicates the value in the DbUser connection parameter is used	FALSE	No
app_name	str	The name of the IdP application used for authentication	None	No
auto_create	bool	Indicates whether the user should be created if they do not exist	FALSE	No
client_id	str	The client id from Azure IdP	None	No
client_secret	str	The client secret from Azure IdP	None	No
cluster_identifier	str	The cluster identifier of the Amazon Redshift Cluster	None	No
credentials_provider	str	The IdP that will be used for authenticating with Amazon Redshift.	None	No
db_groups	str	A comma-separated list of existing database group names that the DbUser joins for the current session	None	No
db_user	str	The user ID to use with Amazon Redshift	None	No
client_protocol_version	int	The requested server protocol version. The default value is 2 representing BINARY. If the requested server protocol cannot be satisfied a warning will be displayed to the user and the driver will default to the highest supported protocol. See ClientProtocolVersion for more details.	2	No
iam	bool	If IAM Authentication is enabled	FALSE	No
idp_response_timeout	int	The timeout for retrieving SAML assertion from IdP	120	No
idp_tenant	str	The IdP tenant	None	No
listen_port	int	The listen port IdP will send the SAML assertion to	7890	No
login_url	str	The SSO Url for the IdP	None	No
max_prepared_statements	int	The maximum number of prepared statements that can be open at once	1000	No
port	Int	The port number of the Amazon Redshift cluster	5439	No
preferred_role	str	The IAM role preferred for the current connection	None	No
profile	str	The name of a profile in a AWS credentials file that contains AWS credentials.	None	No
region	str	The AWS region where the cluster is located	None	No
secret_access_key_id	str	The secret access key for the IAM role or IAM user configured for IAM database authentication	None	No
session_token	str	The access key for the IAM role or IAM user configured for IAM database authentication. Not required unless temporary AWS credentials are being used.	None	No
ssl	bool	If SSL is enabled	TRUE	No
ssl_insecure	bool	Specifies if IDP hosts server certificate will be verified	TRUE	No
sslmode	str	The security of the connection to Amazon Redshift. verify-ca and verify-full are supported.	verify_ca	No
database_metadata_current_db_only	bool	Indicates if application supports multi-database datashare catalogs. Default value of True indicates application does not support multi-database datashare catalogs for backwards compatibility	TRUE	No
web_identity_token	str	The OAuth 2.0 access token or OpenID Connect ID token that is provided by the identity provider. Your application must get this token by authenticating the user who is using your application with a web identity provider. This parameter is used by JwtCredentialsProvider. For this provider, this is a mandatory parameter.	None	No
role_session_name	str	An identifier for the assumed role session. Typically, you pass the name or identifier that is associated with the user who is using your application. That way, the temporary security credentials that your application will use are associated with that user. This parameter is used by JwtCredentialsProvider. For this provider, this is an optional parameter.	jwt_redshift_session	No
role_arn	str	The Amazon Resource Name (ARN) of the role that the caller is assuming. This parameter is used by JwtCredentialsProvider. For this provider, this is a mandatory parameter.	None	No

Supported Datatypes

redshift_connector supports the following Amazon Redshift datatypes. redshift_connector will attempt to treat unsupported datatypes as strings. Incoming data from Amazon Redshift is treated as follows:

Amazon Redshift Datatype	Python Datatype
ACLITEM	str
BOOLEAN	bool
INT8	int
INT4	int
INT2	int
VARCHAR	str
OID	int
REGPROC	int
XID	int
JSON	dict
FLOAT4	float
FLOAT8	float
TEXT	str
CHAR	str
DATE	datetime.date
TIME	datetime.time
TIMETZ	datetime.time
TIMESTAMP	datetime.datetime
TIMESTAMPTZ	datetime.datetime
NUMERIC	decimal.Decimal
GEOMETRY	str
SUPER	str

Logging

redshift_connector uses logging for providing detailed error messages regarding IdP authentication. A do-nothing handler is enabled by default as to prevent logs from being output to sys.stderr.

Enable logging in your application to view logs output by redshift_connector as described in the documentation for Python logging module.

Client Transfer Protocol

redshift_connector requests the Amazon Redshift server use the highest transfer protocol version supported. As of v2.0.879 binary transfer protocol is requested by default. If necessary, the requested transfer protocol can be modified via the client_protocol_version parameter of redshift_connector.connect(...). Please see the Connection Parameters table for more details.

Getting Help

Ask a question on Stack Overflow and tag it with redshift_connector
Open a support ticket with AWS Support
If you may have found a bug, please open an issue

Contributing

We look forward to collaborating with you! Please read through CONTRIBUTING before submitting any issues or pull requests.

Running Tests

You can run tests by using pytest test/unit. This will run all unit tests. Integration tests require providing credentials for an Amazon Redshift cluster as well as IdP attributes in test/config.ini.

Additional Resources

LICENSE

maimoonaiqbal2000 / amazon-redshift-python-driver Goto Github PK

amazon-redshift-python-driver's Introduction