Giter VIP home page Giter VIP logo

dbt_fundamentals's Introduction

I made this repository while studying dbt. Course link here.

Instead of using the same dataset of the course, I decided to use the pokemon data. It's already available in the /seeds folder.

How to start?

Below you will find the steps teaching how to run the project and some annotations I made while taking the course lessons.

Dependency

To be able to run this project you will need to have Docker installed in your machine.

1. Start the postgres container

docker compose up postgres -d

2. How to run dbt?

docker-compose run dbt --rm <command>

Commands reference.

3. Populating postgres with the seeds

To load all seeds: docker compose run --rm dbt seed

To load just one: docker compose run --rm dbt seed --select 'raw_artists'

In case you receive an error because it was not possible to connect to Postgres, try to change the attribute pokemon.host to localhost inside profiles.yml. For WSL 2 users it's required to keep the pokemon.host as host.docker.internal.

4. Running docs

To serve the project documentation on port 8080, you need to run:

docker compose run --rm -p 8080:8080 dbt docs serve --port 8080

Study Annotations

Run dbt

To run only one model: docker compose run --rm dbt run --select <model_name>

Ps.: It only requires the model name without the .sql extention.


Configuring materialization

To configure a model to be materialized as a table instead of a view, it's required to add this config at the top of the model file:

{{
    config(
        materialized='table'
    )
}}

However, it's also possible to configure it directly on dbt_project.yml file.


Seeds usage

Seeds should not be used to load raw data (for example, large CSV exports from a production database).

Since seeds are version controlled, they are best suited to files that contain business-specific logic, for example a list of country codes or user IDs of employees.


Naming conventions

Sources

  • the raw data that has already been loaded.

Staging

  • one to one with source tables

Intermediate

  • are the models between staging and final models
  • should always be built on staging models

Fact

  • things that are occurring or have occurred.
  • events, clicks, votes.

Dimension

  • people, place, user, etc.

Source usage

It's important to have a source file to configure the staging sources. In case someone need to rename a schema, or something, it's way easier to just change in the source file instead of having to rename it in every staging file.

Ps.: The source name is used as schema name by default. If you have a different schema name, you have to add the attribute in the source file.


TODO

  • Transform the seeds in a init.sql script executed by docker-compose when starting up the PostgreSQL container. When executing the init.sql, it should have a column with the current timestamp to be used to check the data freshness. (It's a bad practice to keep huge .csv files as seeds. The only file here that makes sense to be a seed in the raw_cities.csv.)

dbt_fundamentals's People

Contributors

odilonjk avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.