Giter VIP home page Giter VIP logo

dbt-model-usage's Introduction

dbt Model Usage

This dbt package provides tests to let you know whether your models are still relevant to your users. These tests scan your database's query logs to check if users are still SELECTing from the tables dbt produces. Test failures can let you know when it might be time to retire unused models.


Database Support

This package currently supports Google BigQuery and Snowflake.


Installation

  1. Add this package to your project's packages.yml
    packages:
      - package: rjh336/dbt_model_usage
        version: 0.1.2
  2. Update dependencies in your project
    $ dbt deps

Setup

In your project

You can configure this package via the vars config in your dbt_project.yml

# dbt_project.yml

vars:
  # BIGQUERY ONLY:
  model_usage_dbt_query_comment_pattern: 'my-custom-query-comment-regex'

  # SNOWFLAKE ONLY:
  model_usage_dbt_query_tag_pattern: 'my-custom-query-tag-regex'
  • model_usage_dbt_query_comment_pattern: Regular expression string used to find and EXCLUDE queries executed by dbt via the query-comment. Normally we would not consider these queries as 'user' queries since they might run every time models are built (e.g. tests and hooks). By default, this value is set to ^\/\*\s+\{"app"\:\s+"dbt".*. If your project uses a custom query-comment you might want to use your own pattern. If you prefer to count dbt-generated queries in your tests to indicate a model's relevance, then set this variable to ''.

  • model_usage_dbt_query_tag_pattern: Regular expression string used to find and EXCLUDE queries executed by dbt via the query_tag. If this variable is not defined then tagged queries will count as relevant user SELECT statements in the test results.

Required Permissions

BigQuery

Since the BigQuery implementation of these tests will query from the INFORMATION_SCHEMA.JOBS view, the Google Cloud user referenced in your profiles.yml must include the IAM permission bigquery.jobs.listAll.

Snowflake

dbt must have permission to query from the Account Usage and Information Schema views.


Available Tests

model_queried_within_last

Asserts that the target model has been queried by your users within a defined lookback time period, AND that the model was created at some point before the lookback period start time. "User queries" are defined as SELECT statements that were not executed by dbt.

Args:

  • num_units - [REQUIRED] Number of time units to look back from current time.
  • time_unit - [REQUIRED] The unit of time used for the lookback period. Can be one of: "day" | "hour" | "minute" | "second".
  • threshold - [OPTIONAL] If the model's user query count within the lookback period is below this number, the test fails. Default value is 1.

Limitations:

  • BigQuery: query jobs are retained for up to 180 days, so setting a lookback period greater than 180 days is not supported.
  • Snowflake: account query history is retained for up to 1 year, so setting a lookback period greater than 365 days is not supported.

Usage:

# properties.yml

version: 2

models:
  - name: some_model
    tests:
      - dbt_model_usage.model_queried_within_last:
          num_units: 30
          time_unit: day
          threshold: 1

^ This example fails if some_model was created at some point earlier than 30 days ago, AND there have been fewer than 1 user queries referencing some_model in the last 30 days.


column_queried_within_last

Asserts that the column in the target model has been directly referenced in your users' queries within a defined lookback time period, AND that the target model was created at some point before the lookback period start time. "User queries" are defined as SELECT statements that were not executed by dbt.

Args:

  • num_units - [REQUIRED] Number of time units to lookback from current time.
  • time_unit - [REQUIRED] The unit of time used for the lookback period. Can be one of: "day" | "hour" | "minute" | "second".
  • threshold - [OPTIONAL] If the column's user query count within the lookback period is below this number, the test fails. Default value is 1.

Limitations: same as for model_queried_within_last

Usage:

version: 2

models:
  - name: some_model
    columns:
      - name: some_column
        tests:
          - dbt_model_usage.column_queried_within_last:
              num_units: 10
              time_unit: day
              threshold: 1

^ This example fails if some_model was created at some point earlier than 10 days ago, AND there have been fewer than 1 user queries referencing some_model.some_column in the last 10 days.

dbt-model-usage's People

Contributors

craigwilson-zoe avatar rjh336 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.