Giter VIP home page Giter VIP logo

dimensional-modeling---policytransactions's Introduction

Airflow + DBT + Snowflake: fact table and dimensional tables including SCD2 with load historical data and incremental updates out of order records using dbt macro because out of the box DBT snapshots aproach can not load out of order data properly. There are different approaches for creating Primary Key in dimensional tables: hash value of natural unique key, sequence, next max value of PK in this table (training and dbt testing).

DBT tests, orchestration log tables (dbt model), tables loads log (dbt pre-hook) and load complete emails with DW tables status. Airflow DAG is dynamically built based on dbt graph.

Not included: moving data from a source system to staging tables.

Orchestration is done using Airflow branch python operators and variables.

Pipeline Steps

  1. Load Start: Load date is set, orchestration logging starts, DB connection syncs between DBT and Airflow.
  2. Conditional step: Re-Creating DW tables (dbt full-refresh mode, seeds load). Default values can be loaded in tables at this step.
  3. Defining Incremental Load range based on a previous load (Airflow hook + dbt analytic query)
  4. Conditional step: Validating Staging data if they are ready/present to be loaded into DW (Airflow sensor + dbt analytic query)
  5. Conditional step: Some transformations in the staging area (run dbt models)
  6. Conditional step: Load dimensions (run dbt models and macro for SCD2)
  7. Conditional step: Load transactional fact tables (run dbt models)
  8. Conditional step: Load summaries (based on transactional fact tables) fact tables (run dbt models)
  9. Finalizing load, closing orchestration log (run dbt models), email notification (Airflow hook + dbt analytic query)
  10. Conditional step: Testing loaded data (dbt test)
  11. Conditional step: Refresh dashboards

Lineage Graph

image

image

image

image

image

image

image

dimensional-modeling---policytransactions's People

Contributors

katerynad avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.