Giter VIP home page Giter VIP logo

dbt-graph-theory's Introduction

Hi there ๐Ÿ‘‹

I'm James (he/him), from Dublin in Ireland. I studied Theoretical Physics in Trinity College Dublin. I am interested in Applied Math and Game theory - particularly in relation to Ecology. I'd love to do a PhD in this area at some point, if the stars align ๐Ÿ˜„.

I currently work in a Quant Research department for a trading company.

I have worked at:

  • CRH, Mostly Investor relations
  • Conjura, Analytics, Product Lead & Engineering Manager

There are some open source items that I have worked / am working on:

  • A Universal Semantic Layer (still private, slow WIP with a friend of mine)
  • DBT (I've used it a lot, have a love/hate relationship with it, am a package maintainer, and have been invovled with the duckdb adapter)
  • PuffinDB (mostly giving product guidance)
  • Pypulation, a python library for population dynamics (it's too basic to be useful right now)

And a few ideas I have for the future:

  • A SQL supporting data lake powered by single table design on dynamoDB (not designed to be useful)
  • Opinionated SQL linter for building sensible sql pipelines
  • DuckDB extension for FMRI data

Get in touch with my linkedin or email (jpmmcneill AT gmail.com)

dbt-graph-theory's People

Contributors

jpmmcneill avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

bparbhurestore

dbt-graph-theory's Issues

Graph based join reduction

Consider the sets [A,B,C] and [A',B',C'] being joined.

Let's say that there is some complicating data in the underlying join condition, meaning that some

flowchart LR
  A---A'
  B---A'
  B---B'
  C---B'
  C---C'

intuitively, this could correspond to:

flowchart LR
  A---A'
  B---B'
  C---C'

Namely, we should investigate a graph based approach to reducing join duplication on both sides (and implicitly assuming that a 1-1 relationship should be constructed).

A more tangible example is:

flowchart LR
  A---A'
  B---A'
  B---B'

which could be reduced to:

flowchart LR
  A---A'
  B---B'

Graph is connected generic test

A connected graph is one where the entire set is a subgraph (from the perspective of the subgraph_identifier macro).

This generic test could invoke the subgraph identifier, and then check whether there was only one subgraph id per graph id.

This needs to be a model level test, and the columns could be inputs to the macro.

CircleCI

With CircleCI, integration tests on other DBs (not just postgres) could be set up.

This would incur some costs - so probably just PRs to main would suffice.

Kahn's algorithm macro

To generate topologically sorted graph

This would potentially make an ordering for a given graph

CI/CD

Set up framework for running integration tests both locally and on PR.

Connect subgraphs macro

Macro that enforces all subgraphs in an original table to be connected.

Optional node rank

Identify subgraphs macro

Macro that identifies subgraphs, with an optional graph level for multiple graphs in the same table

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.