Giter VIP home page Giter VIP logo

dbtplyr's Introduction

dbtplyr

This add-on package enhances dbt by providing macros which programmatically select columns based on their column names. It is inspired by the across() function and the select helpers in the R package dplyr.

dplyr (>= 1.0.0) has helpful semantics for selecting and applying transformations to variables based on their names. For example, if one wishes to take the sum of all variables with name prefixes of N and the mean of all variables with name prefixes of IND in the dataset mydata, they may write:

summarize(
  mydata, 
  across( starts_with('N'), sum),
  across( starts_with('IND', mean)
)

This package enables us to similarly write dbt data models with commands like:

{% set cols = dbtplyr.get_column_names( ref('mydata') ) %}
{% set cols_n = dbtplyr.starts_with('N', cols) %}
{% set cols_ind = dbtplyr.starts_with('IND', cols) %}

select

  {{ dbtplyr.across(cols_n, "sum({{var}}) as {{var}}_tot") }},
  {{ dbtplyr.across(cols_ind, "mean({{var}}) as {{var}}_avg") }}

from {{ ref('mydata') }}

which dbt then compiles to standard SQL.

Alternatively, to protect against cases where no column names matched the pattern provided (e.g. no variables start with n so cols_n is an empty list), one may instead internalize the final comma so that it is only compiled to SQL when relevant by using the final_comma parameter of across.

  {{ dbtplyr.across(cols_n, "sum({{var}}) as {{var}}_tot", final_comma = true) }}

Note that, slightly more dplyr-like, you may also write:

select

  {{ dbtplyr.across(dbtplyr.starts_with('N', ref('mydata')), "sum({{var}}) as {{var}}_tot") }},
  {{ dbtplyr.across(dbtplyr.starts_with('IND', ref('mydata')), "mean({{var}}) as {{var}}_avg") }}

from {{ ref('mydata') }}

But, as each function call is a bit longer than the equivalent dplyr code, I personally find the first form more readable.

Macros

The complete list of macros included are:

Functions to apply operation across columns

  • across(var_list, script_string, final_comma)
  • c_across(var_list, script_string)

Functions to evaluation condition across columns

  • if_any(var_list, script_string)
  • if_all(var_list, script_string)

Functions to subset columns by naming conventions

  • starts_with(string, relation or list)
  • ends_with(string, relation or list)
  • contains(string, relation or list)
  • not_contains(string, relation or list)
  • one_of(string_list, relation or list)
  • not_one_of(string_list, relation or list)
  • matches(string, relation)
  • everything(relation)
  • where(fn, relation) where fn is the string name of a Column type-checker (e.g. "is_number")

Note that all of the select-helper functions that take a relation as an argument can optionally be passed a list of names instead.

Documentation for these functions is available on the package website and in the macros/macro.yml file.

dbtplyr's People

Contributors

andrewcstewart avatar beadw avatar dah33 avatar emilyriederer avatar kowalks avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

dbtplyr's Issues

noob question about macros.

Hi emily. thank you so much for putting this together. its a great package. I am a dplyr person at heart and a complete dbt nooob and I want to ask a pretty general question just to get the lay of the land so I don't write any macros that exist.

Are there other of these dbt macros libraries/collections out there that you recommend checking out?

Add ability to return either list of column names or Column objects

Per conversation in #5 , in some cases it is useful to have the list of Column objects and not just their names. However, the current workflow strongly assumes names are being returned.

We need to develop a strategy to avoid breaking changes but all the ability to return Column objects instead.

Log warning: deprecated config paths

On run:

06:00:37  [WARNING]: Deprecated functionality
The `source-paths` config has been renamed to `model-paths`. Please update your
`dbt_project.yml` configuration to reflect this change.
06:00:37  [WARNING]: Deprecated functionality
The `data-paths` config has been renamed to `seed-paths`. Please update your
`dbt_project.yml` configuration to reflect this change.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.