Giter VIP home page Giter VIP logo

datacroft-analytics-stack's Introduction

Datacroft: Analytics Stack

Datacroft: Analytics Stack was developed by FELD M to help you easily set up a pipeline for reporting and analytics. The stack includes the following components:

  • 🚂 a data loading/EL tool (Airbyte) to help you get data from different sources onto your data warehouse
  • 🗄 a simple data warehouse (a PostgreSQL database) where you can store all your data
  • 🪄 a data modeling tool (dbt) to help you transform your raw data into ready-to-use tables
  • 🔭 a visualization tool (Metabase) where you can build dashboards and explore your data

We assembled this stack to be easy to spin up, with minimal required resources so you can pull your data and experiment with it as fast as possible. If you have your own machine to run this on (such as a private server, or even just your own laptop), the whole stack is free to run and use, since all the tools are open-source.

The Stack

It includes the following components as Docker containers:

image

These are run using Docker Compose, so it assumes you have a private server or a machine to run this on. If you prefer to use cloud-based, all the tools above are also available as cloud subscriptions with free trials.

You can also switch any of the tools above if you prefer to use a different one. For example, if you want to use Tableau instead of Metabase, comment out the Metabase section in the Dockerfile and connect your Tableau instance to the data warehouse. You could also use cloud-based warehouses such as BigQuery.

Requirements

Since we use Docker Compose to run the stack, you will need Docker installed on your machine. We also use a makefile so you can easily run commands to use the stack.

This stack was tested on macOS and Ubuntu.

Quickstart

To set up a pipeline in your machine:

  1. Make sure that Docker is running
  2. Edit the following configuration files (this step is optional -- if you skip it, the stack will still run, the tools will just be configured with the default values/credentials set in the .env files.)
    • .env: credentails and configurations for PostgreSQL Data Warehouse, dbt, and Metabase
    • airbyte.env: credentials and configuration preferences for Airbyte
  3. Open a terminal on the project directory and execute make run to start the containers
  4. Access the apps from below.

To access the applications:

  • PostgreSQL: connect to [host]:5432 using a PostgreSQL client
  • Airbyte: open [host]:8000 on a web browser
  • Metabase: open [host]:3000 on a web browser

If you're running this on your own machine, [host] will be localhost, so Airbyte will be available in localhost:8000, Metabase in localhost:3000, and PostgreSQL from localhost:5432.

To stop the containers:

Go to the project folder in the terminal and run make stop.

Some notes

Note that the first time you run make run, the command might take a while to run, since Docker has to pull all the images. After the command finishes, Metabase will also not yet be immediately available, since it's also still setting up. In our experience, this can take around 15 mins or more.

Once you see that Airbyte and Metabase are accessible from the browser, you may get prompted by Airbyte for a username and password. If you did not change these settings from airbyte.env, then the defaults are set as username airbyte and password password.

Where is the stack best used?

The whole stack runs from two Dockerfiles (one for Airbyte, one for the other tools), which is run on a single machine. Thus, it's best for fast, exploratory work, where you want to be able to get your data and experiment with it as soon as you can, or deliver proof-of-concepts ASAP. It's not as suitable for heavy data syncs or lots of data connections running frequently.

Note that this limitation is not because of the tools themselves. If you want to scale this stack and use it for production workloads, you can take two paths. You can either:

  1. install each of the tools natively on (a) machine/s dedicated for your production workload (Airbyte open source, PostgreSQL db, dbt Core, and Metabase open source), (and if needed, such as with Metabase, properly configure each of them for production as described in their docs), or,
  2. use the cloud-based versions of the tools.

If going for path #2, the stack we commonly use is: Airbyte Cloud + GCP BigQuery + dbt Cloud/dbt Core (which you can also run in GCP if you prefer) + Metabase Cloud/PowerBI/Looker Studio/any BI tool of your choice.

Want some help setting this up?

You can contact us here, or at https://www.feld-m.de/. 😀

datacroft-analytics-stack's People

Contributors

kirstenhipolito avatar jbsilva avatar

Stargazers

Georvic Tur avatar Tomas Kristof avatar İlayda Selin Türk avatar lorsbach avatar  avatar  avatar  avatar  avatar MB avatar  avatar  avatar

Watchers

 avatar lorsbach avatar Tomas Kristof avatar

Forkers

natterstefan

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.