This guide runs through how a data streaming pipeline that enriches and transforms ingested logs can be deployed in AWS using Kinesis and Glue services.
In this PoC, a python script is used to generate logs which is ingested by a Kinesis Data Stream. The logs simulated contains a “port_number” field. Kinesis Data Analytics transforms the log data in the Kinesis Data Stream and inserts the curated logs into another Kinesis Data Stream. The curated logs will be enriched with a “tag” field with its value dependent on the value of the “port_number” field. Kinesis Firehose Delivery Stream ingests the data from the Kinesis Data Stream for curated logs and streams it into an S3 bucket. The data is partitioned in the S3 bucket by year, month, day and hour.
A detailed walkthrough of the deployment steps can be found here (https://quip-amazon.com/9y2SAbT7CWRS#XQA9AAC7E4R).
Logical ID | Type |
---|---|
AnalyticsApplication | AWS::KinesisAnalyticsV2::Application |
AnalyticsServiceExecutionRole | AWS::IAM::Role |
AthenaWorkgroup | AWS::Athena::WorkGroup |
FirehoseServiceExecutionRole | AWS::IAM::Role |
FirehoseStream | AWS::KinesisFirehose::DeliveryStream |
GlueDatabase | AWS::Glue::Database |
GlueTable | AWS::Glue::Table |
InputStream | AWS::Kinesis::Stream |
OutputStream | AWS::Kinesis::Stream |
S3Logs | AWS::S3::Bucket |