Giter VIP home page Giter VIP logo

dstaley / analytics-with-kafka-redshift-metabase Goto Github PK

View Code? Open in Web Editor NEW

This project forked from heroku-examples/analytics-with-kafka-redshift-metabase

0.0 2.0 0.0 4.79 MB

An example system that captures a large stream of product usage data, or events, and provides both real-time data visualization and SQL-based data analytics.

Shell 1.64% JavaScript 72.93% HTML 6.40% TSQL 0.64% CSS 18.39%

analytics-with-kafka-redshift-metabase's Introduction

Example Product/User Analytics System Using Apache Kafka, AWS Redshift, and Metabase

This is an example of a system that captures a large stream of product usage data, or events, and provides both real-time data visualization and SQL-based data analytics. The stream of events is captured by Apache Kafka and made available to other downstream consumers. In this example, there are two downstream consumers of the data. The data flowing through Kafka can be viewed in near real-time using a web-based data visualization app. The other consumer stores all the data in AWS Redshift, a relational database that Amazon describes as "a fast, scalable data warehouse." Then we can query and visualize the data in Redshift from a SQL-compliant analytics tool. This example uses Metabase deployed to Heroku. Metabase is an open-source analytics tool used by many organizations, large and small.

This entire system can be deployed in 15 minutes -- most of that time spent waiting for Heroku and AWS to provision services -- and it requires very little ongoing operational maintenance.

Here's an overview of how the system works.

Structure

This project includes 3 apps:

  1. A data producer called generate_data. Data is simulated in this example, but this could be replaced with almost anything that produces data: a marketing website, a SaaS product, a point-of-sale device, a kiosk, internet-connected thermostat or car. And more than one data producer can be added.
  2. A real-time data visualizer called viz, which shows relative volume of different categories of data being written into Kafka.
  3. And a Kafka-to-Redshift writer called reshift_batch, which simply reads data from Kafka and writes it to Redshift.

They all share data using Apache Kafka on Heroku.

You can optionally deploy Metabase to Heroku to query Redshift. Check out Metabase's Heroku Deploy Button.

Deploy

Prerequisites

  • An AWS Redshift cluster. Check out this Terraform script for an easy way to create a Redshift cluster along with a Heroku Private Space and a private peering connection between the Heroku Private Space and the Redshift's AWS VPC. Not free! This will incur cost on AWS and Heroku.
  • Node.js

Deploy to Heroku

git clone [email protected]:heroku-examples/kafka-stream-viz.git
cd kafka-stream-viz
heroku create
heroku addons:create heroku-kafka:basic-0
heroku kafka:topics:create ecommerce-logs
heroku kafka:consumer-groups:create redshift-batch
heroku config:set KAFKA_TOPIC=ecommerce-logs
heroku config:set KAFKA_CMD_TOPIC=audience-cmds
heroku config:set KAFKA_WEIGHT_TOPIC=weight-updates
heroku config:set KAFKA_CONSUMER_GROUP=redshift-batch
heroku config:set FIXTURE_DATA_S3='s3://aws-heroku-integration-demo/fixture.csv'
git push heroku master

Alternatively, you can use the Heroku Deploy button:

Deploy

And then create the necessary Kafka topic and consumer group:

heroku kafka:topics:create ecommerce-logs #this can also be created at https://data.heroku.com/
heroku kafka:topics:create audience-cmds #this can also be created at https://data.heroku.com/
heroku kafka:topics:create weight-updates #this can also be created at https://data.heroku.com/
heroku kafka:consumer-groups:create redshift-batch

Optionally, you can deploy Metabase to Heroku and use SQL to query and visualize data in Redshift. Use Metabase's Heroku Deploy button. Once deployed, you'll need to configure Metabase with the Redshift cluster URL, database name, username, and password.

Deploy Locally

git clone [email protected]:heroku-examples/kafka-stream-viz.git
npm i

Run

The following environment variables must be defined. If you used the Heroku deploy instructions above, all of the variables are already defined except for DATABASE_URL.

  • DATABASE_URL: Connection string to an AWS Redshift cluster
  • FIXTURE_DATA_S3: S3 path to CSV of fixture data to load into Redshift before starting data stream through Kafka (e.g. s3://aws-heroku-integration-demo/fixture.csv)
  • KAFKA_URL: Comma-separated list of Apache Kafka broker URLs
  • KAFKA_CLIENT_CERT: Contents of the client certificate (in PEM format) to authenticate clients against the broker
  • KAFKA_CLIENT_CERT_KEY: Contents of the client certificate key (in PEM format) to authenticate clients against the broker
  • KAFKA_TOPIC: Kafka topic the system will produce to and consume from
  • KAFKA_CMD_TOPIC: Kafka topic the system will read audience cmds from
  • KAFKA_WEIGHT_TOPIC: Kafka topic the system will produce category weight updates to
  • KAFKA_CONSUMER_GROUP: Kafka consumer group name that is used by redshift_batch process type to write to Redshift.
  • KAFKA_PREFIX: (optional) This is only used by Heroku's multi-tenant Apache Kafka plans (i.e. basic plans)

Then in each of the generate_data, viz, and redshift_batch directories, run npm start.

Open the URL in the startup output of the viz app. It will likely be http://localhost:3000.

analytics-with-kafka-redshift-metabase's People

Contributors

bighitbiker3 avatar dianaperkinsdesign avatar fritzy avatar lukekarrys avatar maxbeizer avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.