awslabs / amazon-redshift-monitoring Goto Github PK

View Code? Open in Web Editor NEW

267.0 79.0 99.0 1.67 MB

Amazon Redshift Advanced Monitoring

License: Apache License 2.0

Python 90.89% Shell 9.11%

amazon-redshift-monitoring's People

Contributors

Stargazers

Watchers

Forkers

ericfe praveen-symphony derekchan bweiner zhangbofrank vadirajks prasantapattnaik aheiss1 teamsoo jaskirat pilgrim2go existeundelta waterytowers sanjoy-hike sl-victorceron pvbouwel piofernando jayzhs kurtisvorisamazon c0ns0le zapier kumareshbabuns vinaygupta1234 skkodali gridl paul94133 sunsation gwsu2008 jonathan-nace msksuresh melbit-haixingzhang munendra-chevuru sqladmin sqljudo rchamarthi daitc2004 sridharalla jianleisun frankfarrell chrisliny pravin-gadekar pmahmoudzadeh benlorence sdonataccio-handy g3santhosh amitchmca tchaudhary monasin mathan218 dabay devopseze nery morellifernando02 damally vaibhavpokale dharmendra1910 r2o0h0i1t ben-zhang cs-chandu chethi007 gusdelact vintageplayer ebunt rohittiwari87 latheswar lions0815 karimamer77 seanchoutw swenting litosantos yongw raj-adhikari ravi77o aayushibh patricktrainer ls-2018 devokun shivaprasadpatil purushottamkumar-hub gemechisyadeta venzino-han kgtdbx tomdaly andresmgot ksuresh08 ericpek007 caoxp930 signifyd elijahahianyo manjunathrreddy sibasish fvianamo-ttl felipeduarteferreira horkays

amazon-redshift-monitoring's Issues

Lambda function timing out while publishing cloudwatch metrics

I deployed CloudWatch using SAM and I can see the data being fetched from Redshift Cluster within 1 minute, however the Lambda function still doesn't complete after running for 5 minutes,.

Lambda logs report:
Executing Redshift Diagnostic Query: WLMQuerySlotCountWarning
Publishing 24 CloudWatch Metrics
END RequestId: b054f912-b614-11e8-aa9e-d5eb851e7827
REPORT RequestId: b054f912-b614-11e8-aa9e-d5eb851e7827 Duration: 300005.52 ms Billed Duration: 300000 ms Memory Size: 192 MB Max Memory Used: 32 MB
2018-09-11T22:52:41.248Z b054f912-b614-11e8-aa9e-d5eb851e7827 Task timed out after 300.01 seconds

Cloudwatch Log report:


22:47:42
Executing Redshift Diagnostic Query: WLMQuerySlotCountWarning

22:47:42
Publishing 24 CloudWatch Metrics
Publishing 24 CloudWatch Metrics

22:52:41
END RequestId: b054f912-b614-11e8-aa9e-d5eb851e7827
END RequestId: b054f912-b614-11e8-aa9e-d5eb851e7827

22:52:41
REPORT RequestId: b054f912-b614-11e8-aa9e-d5eb851e7827 Duration: 300005.52 ms Billed Duration: 300000 ms Memory Size: 192 MB Max Memory Used: 32 MB

22:52:41
2018-09-11T22:52:41.248Z b054f912-b614-11e8-aa9e-d5eb851e7827 Task timed out after 300.01 seconds

22:52:41
Pushing metrics to CloudWatch failed: exception ('Connection aborted.', error(1, 'Operation not permitted'))

22:52:41
/var/task/redshift_monitoring.py:249: SyntaxWarning: name 'debug' is assigned to before global declaration

22:52:41
global debug

Can't create Change Set

Hi there,
I'm trying to set this up using the links you provide below, but I can't complete the setup. I get a message saying:
Check the following transforms: ["AWS::Serverless-2016-10-31"] You must use a change set to create this stack because it includes one or more transforms.

But when I click the Create Change Set button, nothing happens and Execute is still greyed out.

Am I missing a step?

-- joe.

No module named enum

I deployed the v1.5 zip which had the fix for pgpass library and it seems like i am seeing another module being missing.

17:15:47
global debug

17:15:47
START RequestId: 51a926fe-5e2d-43a0-adfd-4e8e022e016b Version: $LATEST

17:15:47
Unable to import module 'lambda_function': No module named enum

17:15:47
END RequestId: 51a926fe-5e2d-43a0-adfd-4e8e022e016b

17:15:47
REPORT RequestId: 51a926fe-5e2d-43a0-adfd-4e8e022e016b Duration: 0.50 ms Billed Duration: 100 ms Memory Size: 192 MB Max Memory Used: 32 MB

17:16:44
START RequestId: 51a926fe-5e2d-43a0-adfd-4e8e022e016b Version: $LATEST

17:16:44
Unable to import module 'lambda_function': No module named enum

17:16:44
END RequestId: 51a926fe-5e2d-43a0-adfd-4e8e022e016b

17:16:44
REPORT RequestId: 51a926fe-5e2d-43a0-adfd-4e8e022e016b Duration: 0.68 ms Billed Duration: 100 ms Memory Size: 192 MB Max Memory Used: 32 MB

17:18:35
START RequestId: 51a926fe-5e2d-43a0-adfd-4e8e022e016b Version: $LATEST

17:18:35
Unable to import module 'lambda_function': No module named enum

17:18:35
END RequestId: 51a926fe-5e2d-43a0-adfd-4e8e022e016b

17:18:35
REPORT RequestId: 51a926fe-5e2d-43a0-adfd-4e8e022e016b Duration: 0.52 ms Billed Duration: 100 ms Memory Size: 192 MB Max Memory Used: 32 MB

Support for IAM-based authentication

Are there plans to update this project with an option to avoid encrypting a password with KMS and instead relying on the IAM role attached to the Lambda function to provide authentication to Redshift?

Can't use password without encrypting with KMS

The section in redshift_monitoring.py that handles passwords checks for an unencrypted password and then sets the password to None. Because of this, even if an unencrypted password exists, it will never be used.

Need to change the flow of logic here so that the unencrypted password can be used.

1.4 is still missing pgpasslib

1.4 is still missing pgpasslib. Please update it.

No LAMBDA

Non-Lambda version please? We're in Austrlia, managing clients with Redhisft cluster but Lambda is not available here yet.

link for VPC stack for us -east1

the URL for east-1 is incorrect so cant Launch the stack has changed..it
https://s3-us-east-1.amazonaws.com/awslabs-code-us-east-1/RedshiftAdvancedMonitoring/deploy-vpc.yaml

it should be
https://s3.amazonaws.com/awslabs-code-us-east-1/RedshiftAdvancedMonitoring/deploy-vpc.yaml

The following resource(s) failed to create: [ScheduledServiceIAMRole]

The policy failed legacy parsing (Service: AmazonIdentityManagement; Status Code: 400; Error Code: MalformedPolicyDocument

I have given AdministratorAccess permission to the role which I am using to run the CloudFormation template.

Cannot work with Redshift version 1.0.18788

The metrics cannot be generated with version 1.0.18788. The possible cause might be connection failure. Here is the stack trace:

The read operation timed out: timeout
Traceback (most recent call last):
  File "/var/task/lambda_function.py", line 15, in lambda_handler
    redshift_monitoring.monitor_cluster(config_sources)
  File "/var/task/redshift_monitoring.py", line 329, in monitor_cluster
    put_metrics.extend(gather_service_class_stats(cursor, cluster))
  File "/var/task/redshift_monitoring.py", line 124, in gather_service_class_stats
    ''')
  File "/var/task/redshift_monitoring.py", line 98, in run_command
    cursor.execute(statement)
  File "/var/task/lib/pg8000/core.py", line 861, in execute
    self._c.execute(self, operation, args)
  File "/var/task/lib/pg8000/core.py", line 1909, in execute
    self.handle_messages(cursor)
  File "/var/task/lib/pg8000/core.py", line 1972, in handle_messages
    code, data_len = ci_unpack(self._read(5))
  File "/var/lang/lib/python3.6/socket.py", line 586, in readinto
    return self._sock.recv_into(b)
  File "/var/lang/lib/python3.6/ssl.py", line 1012, in recv_into
    return self.read(nbytes, buffer)
  File "/var/lang/lib/python3.6/ssl.py", line 874, in read
    return self._sslobj.read(len, buffer)
  File "/var/lang/lib/python3.6/ssl.py", line 631, in read
    v = self._sslobj.read(len, buffer)
socket.timeout: The read operation timed out

In comparison, we have another cluster with the latest version (1.0.18861) which generated and updated the metrics successfully. Based on the comparison, we assume that the Redshift version might influence the project.

NOTE
We modified the code to make timeout as an input parameter. The code changes are listed below:
redshift-monitoring.py

line 262      #Add timeout to config_sources
line 263     timeout = int(get_config_value(['TimeOut', 'time_out', 'timeOut'], config_sources))
..........
line 305     conn = pg8000.connect(database=database, user=user, password=pwd, host=host, port=port, ssl=ssl, timeout=timeout)

The input JSON of the cloud watch event rule:
{ "DbUser": "xxxxxxx", "EncryptedPassword": "**************", "ClusterName": "xxxxxxxxxx", "HostName": "xxxxxxxxxxxxxx", "HostPort": "xxxx", "DatabaseName": "xxxxxxxxxx", "AggregationInterval": "1 hour", "TimeOut": "20" }

P.S

Project version: 1.7
Deployment method: manually create AWS resources and upload zip onto Lambda function

Remove password encryption approach and convert to using SSM Parameter Store

Keep support for .pgpasslib

no new metrics after 3 days?

Hi again - I've got the function successfully running hourly, but I don't see any new metrics resulting from it. Here's an error reported in the logs that might be related:

(u'ERROR', u'42P01', u'relation "sensor_data" does not exist', u'/home/ec2-user/padb/src/pg/src/backend/catalog/namespace.c', u'237', u'RangeVarGetRelid'): ProgrammingError
Traceback (most recent call last):
File "/var/task/lambda_function.py", line 328, in lambda_handler
put_metrics.extend(run_external_commands('User Configured', 'user-queries.json', cursor, cluster))
File "/var/task/lambda_function.py", line 93, in run_external_commands
interval = run_command(cursor, command['query'])
File "/var/task/lambda_function.py", line 129, in run_command
cursor.execute(statement)
File "/var/task/lib/pg8000/core.py", line 852, in execute
self._c.execute(self, operation, args)
File "/var/task/lib/pg8000/core.py", line 1741, in execute
self.handle_messages(cursor)
File "/var/task/lib/pg8000/core.py", line 1879, in handle_messages
raise self.error
ProgrammingError: (u'ERROR', u'42P01', u'relation "sensor_data" does not exist', u'/home/ec2-user/padb/src/pg/src/backend/catalog/namespace.c', u'237', u'RangeVarGetRelid')

This error is reported at least twice for each Lambda run, amongst the series of diagnostic queries. Here's a screenshot. I do not see any new metrics for Redshift listed.

pgpasslib missing, v1.5 is S3 permission (us-west-2) are missing

Unable to deploy latest v1.5 in us-west-2, S3 bucket permissions/access denied.

Deployed thru SAM; function fails due to missing module

Unable to import module 'lambda_function': No module named pgpasslib

I used the VPC template for us-east-1

module initialization error: 'db_user'

I'm getting the error in the subject each time the function is invoked. Is there a way for me to better diagnose the issue? The function logs in CloudWatch aren't very verbose besides to point out the module error.

Password decrypt is timing out

I have setup the KMS key in the same region where the Redshift cluster resides. Encrypted the database user password using command line "aws kms encrypt --key-id $KEY_ID --plaintext ". Edited the lambda_function.py script to fill in the configurations where the enc_password field is set to the "CiphertextBlob" output from the above command line. Now when I am running a test on the lambda function the decrypt step is timing out. Any suggestion on why it is timing out would be appreciated.

Interval is not being used

Hi ,
It seems that the aggregation Interval is not used in the code .
I would like to have some monitors run in one interval (every 5 minutes for example)
and other monitors will run in a different ones (every hour)