z3z1ma / dbt-osmosis Goto Github PK

View Code? Open in Web Editor NEW

446.0 7.0 45.0 6.97 MB

Provides automated YAML management, a dbt server, streamlit workbench, and git-integrated dbt model output diff tools

Home Page: https://z3z1ma.github.io/dbt-osmosis/

License: Apache License 2.0

Python 96.89% Jinja 2.97% Shell 0.14%

dbt data sql modelling testing editor cli documentation

dbt-osmosis's Introduction

Hi there 👋

🔭 I’m currently working on managing data systems at scale to deliver maximum business value at work and open source work to maximize developer delight outside work!
🌱 I’m currently learning more Rust
👯 I’m looking to collaborate on anything dealing with analytical data systems
💬 Ask me about Software Delivery
📫 How to reach me: Linkedin
😄 Pronouns: He/Him
⚡ Fun fact: I love martial arts

dbt-osmosis's People

Stargazers

Watchers

dbt-osmosis's Issues

Feature request: Case insensitivity when checking for columns

Currently, if I have the following in _stg_my_schema.yml:

version: 2

models:
  - name: stg_my_schema__my_table
    description: ""
    columns:
      - name: id
      - name: _fivetran_synced

dbt-osmosis yaml refactor will remove _fivetran_synced and add _FIVETRAN_SYNCED, resulting in:

version: 2

models:
  - name: stg_my_schema__my_table
    description: ""
    columns:
      - name: id
      - name: _FIVETRAN_SYNCED

I'd like to request case insensitivity when looking for columns to add / remove.

I believe this applies more to DBs like Snowflake that display things in all caps by default (though the identifiers themselves are not actually in caps).

I'm definitely a Python rookie, but I tried changing
c for c in yaml_file_model_section["columns"] if c["name"] != column
to
c for c in yaml_file_model_section["columns"] if c["name"].lower() != column.lower()
in remove_columns_not_in_database()

and
if column.lower() not in model_columns:
to
if column.upper() not in model_columns:
in bootstrap_existing_model()

and it seemed to do the trick for me for now.

Feature request: `dbt-osmosis yaml audit`

I think it would be pretty useful to have a command to execute the audit standalone.
For example this line dbt-osmosis yaml audit -f development.test would have the following output:

             ✅ Audit Report                                                                                                                                                                                                               
             -------------------------------                                                                                                                                                                                               
                                                                                                                                                                                                                                           
             Database: awsdatacatalog                                                                                                                                                                                                      
             Schema: development                                                                                                                                                                                                 
             Table: test                                                                                                                                                                                                          
                                                                                                                                                                                                                                           
             Total Columns in Database: 3.0
             Total Documentation Coverage: 100.0%        
                                                                                                                                                                                                                                           
             Action Log:                                                                                                                                                                                                                   
             Columns Added to dbt: 2                                                                                                                                                                                                     
             Column Knowledge Inherited: 1                                                                                                                                                                                                 
             Extra Columns Removed: 0

Then, one would be able to do whatever they want with it: post it as a comment in a PR or send it as a Slack message, etc

Question / request: Customize name of files

The docs mention I have a few different config options I can use for the name of my yml files, such as:

+dbt-osmosis: "folder.yml"
+dbt-osmosis: "schema/model.yml"

Is there any way to further configure this? For example, our project has the convention of having __sources.yml as the end of our sources file names.

I'm not sure if there's already a way to do it. I tried a few things but was unsuccessful.

[bug] Add support for config: section in profiles.yml

dbt-osmosis 0.4.0 errors on workbench load with the below trace after selecting the profiles & project folders.
The config tag in profiles.yml is described in https://docs.getdbt.com/reference/profiles.yml - we use it to flag partial parsing.

Workaround: comment out config tag in profiles.yml, changing dbt behaviour.

  File "C:\Users\jenkins\AppData\Local\Programs\Python\Python39\lib\site-packages\streamlit\script_runner.py", line 354, in _run_script      
    exec(code, module.__dict__)
  File "C:\Users\jenkins\AppData\Local\Programs\Python\Python39\lib\site-packages\dbt_osmosis\app.py", line 167, in <module>
    [targ for targ in profile_data[prof]["outputs"]],
KeyError: 'outputs'```

Error: AttributeError: module 'dbt.config.profile' has no attribute 'DEFAULT_PROFILES_DIR'

I recently found out about dbt-osmosis and it seems really interesting! Was trying to test it out but am running into an with it not seeming to find my profiles.yml file. I installed via pip and tried running some commands but always get the following error:

Traceback (most recent call last):
  File "/usr/local/bin/dbt-osmosis", line 5, in <module>
    from dbt_osmosis.main import cli
  File "/usr/local/lib/python3.11/site-packages/dbt_osmosis/main.py", line 162, in <module>
    DEFAULT_PROFILES_DIR = dbt.config.profile.DEFAULT_PROFILES_DIR
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: module 'dbt.config.profile' has no attribute 'DEFAULT_PROFILES_DIR'

I attempted to use the --profiles-dir arg with no luck. My profiles.yml file is located in its default location, ~/.dbt.

dbt-core==1.4.5
dbt-osmosis==0.5.0
dbt-snowflake==1.4.2

Bug: Extra linebreak before injected column and missing linebreak after

If I run dbt-osmosis yaml refactor on an existing source, and a column has been added to a table since the last time I ran it, it's added like this:

version: 2

sources:
  - name: my_source_name
    description: ""
    database: my_db_name
    schema: public
    tables:
      - name: my_table
        description: ""
        columns:
          - name: date
            description: ""

          - name: new_column_inserted_by_dbt_osmosis
      - name: my_next_table

Additional context
python 3.9.16
macOS 13.2 (22D49)
Snowflake 7.3.1
dbt-core==1.3.2
dbt-extractor==0.4.1
dbt-osmosis==0.9.8
dbt-snowflake==1.3.0

[feat] Macro Helpers

Leaving this as a note to myself for the next sprint.

Consider how we can use existing cross db macros like those present in dbt audit helper for SQL engine diffs. See how granularly we can expose diffs.

Request: `yaml refactor`: Quote columns that use SQL keywords (such as `GROUP`)

Is your feature request related to a problem? Please describe.
Sometimes a column in a table may have the name GROUP, ORDER, START, SCHEMA, TABLE, etc.
yaml refactor doesn't quote these columns, causing issues when dbt is run

Describe the solution you'd like
I'd like yaml refactor to automatically quote column names when they conflict with database keywords

Describe alternatives you've considered
Manually modifying the templates with huge if x or x conditions to add quotes if needed

When inheriting upstream column annotations, provide case-insensitive configuration.

When inheriting upstream column annotations, provide case-insensitive configuration.
I checked to #4 , there is mention of case issue. At the time I tried it did not work.

The following are some examples:
source yaml：

models/raw_umc/source.yml , which I wrote manually
sources:
  - name: raw_umc
    tables:
    - name: umc_user_device
    description: >
      A table that stores the association between users and devices.
    columns:
      - name: user_id
        data_type: string
        description: User ID, associated with the user table.
      - name: DEVICE_ID
        data_type: string
        description: device_master table incremental ID, device_master table incremental ID
      - name: updated_at
        data_type: string
        description: Last update time.

model:

select * from source('raw_umc','umc_user_device')

After running the command, successfully stg yaml : dbt-osmosis yaml document models/stg --project-dir . --profiles-dir .

models/stg/schema/user_device.yml
version: 2
models:
  - name: bau_user_device
    columns:
      - name: USER_ID
        description: ''
      - name: DEVICE_ID
        description: 'device_master table incremental ID, device_master table incremental ID'
      - name: UPDATED_AT
        description: ''
sources: []

You can see that the upper-case USER_ID and UPDATED_AT fields in stg do not inherit the raw lower-case user_id, updated_at.
Note that I am using snowflake, so the returned schema is in upper case.

Possible useful logs:

INFO     🔬 Looking for actions for model.snowflake_dbt.bau_user_device                                                                                                                                                                                                                                            osmosis.py:1006
INFO     ✨ Schema file is up to date for model model.snowflake_dbt.bau_user_device                                                                                                                                                                                                                                 osmosis.py:877
INFO     🔬 Looking for actions for model.snowflake_dbt.bau_user_device                                                                                                                                                                                                                                              osmosis.py:1006
INFO     💡 Column DEVICE_ID is inheriting knowledge from the lineage of progenitor (source.snowflake_dbt.raw_umc.umc_user_device) for model model.snowflake_dbt.bau_user_device                                                                                                                                           osmosis.py:965
INFO     {'description': 'device_master table incremental ID, device_master table incremental ID'}                                                                                                                                                                                                                                                  osmosis.py:974
INFO     ✨ Schema file /workspac/snowflake_dbt/models/stg/schema/bau_user_device.yml updated                                                                                                                                                                                                                     osmosis.py:872
INFO     🔬 Looking for actions for model.snowflake_dbt.bau_user_partner

Question: How to make the USER_ID and UPDATED_AT fields in stg uppercase, inheriting the lowercase user_id and updated_at fields in the upstream raw? Thank you🙏

Mocking resources in the dbt manifest?

I've just come across this package yesterday and it looks like it will be incredibly useful. I'm particularly interested in being able to run sql against my dbt project through the python interface. I have a question...

Once we've loaded the project into memory like so

runner = DbtProject(
    project_dir=...,
    target="prod",
)

how easy do you think it would be to mock/patch models in the runner prior to running individual sql queries against it?

If this feature was available, combined with the provided ability to run sql against a pre-compiled version of the manifest, it would be quite straightforward to extend this package to support unit testing your dbt models.

I'm quite keen to take a look at how to do this myself to be honest as it's something I've mulling over how to do for a long time now! I'm just keen to hear if this sounds doable before I dig in.

Support the `--profile` and `--vars` options

Motivation

I would like to specify a dbt profile by --profile and dbt variables by --vars.

First, we have multiple profiles and select one based on the running environment. So, our default profile is set with a local development one. We would like to pass a profile to the CLI.

Second, our dbt models requires variables set by the --vars option so that we dynamically switch the Google Cloud project. We have different YAML files corresponding the destination environments respectively. We run a dbt command like dbt build --vars "$(cat vars-production.yml)". So, we would like to pass dbt variables to dbt-osmosis too.

# vars-production.yml
projects:
  project-a: "project-a-production"

# vars-staging.yml
projects:
  project-a: "project-a-staging"

This is a pseudocode to dynamically get the destination Google Cloud project ID

{% set gcp_project = var('projects')['project-a'] %}

{{
  config(
    materialized="table",
    database=gcp_project,
    schema="dwh_ai_inquiry",
    alias="answer",
  )
}}

SELECT ...

How do you run dbt-power-user with dbt-osmosis?

In this issue @z3z1ma you say that the preferred method of development is with the dbt-server, and in the server docs you imply that you can use the server with dbt-power-user. I've tried to stitch together the solution but am coming up short.

Maybe you could provide an explanation for how to make the integration work and update the docs? Right now dbt-power-user points to this repo for how to set up a dbt REPL, so if you could make that part of the docs more clear I think it could help other people as well.

Support Documentation of Struct Fields in BigQuery

Proposed Behavior

When a user runs dbt-osmosis yaml document it will also pull in and preserve struct level documentation of fields in bigquery projects.

Current behavior

When there are struct fields in the schema.yml or source.yml the process removes any struct documentation that is specified as a field like name: struct.struct_field

Specifications

dbt version: 1.2.0
python version: 3.8.10

Docs overhaul

Documentation drives adoption and usage, we should improve it for the layman who just wants to run diffs, workbench, or schema management stuff easily.

AttributeError: 'NoneType' object has no attribute 'resolve' on yaml commands

Hey there!

I'm trying to make this work and it fails right away, i'm not sure if i'm doing something wrong. Currently no yaml files with schemas exist in the project. The three yaml commands throw this same error.

Error:

dbt-osmosis yaml refactor
INFO     🌊 Executing dbt-osmosis                                                                                                                                                                                                                                  main.py:132

12:13:01  Unable to do partial parsing because config vars, config profile, or config target have changed
INFO     📈 Searching project stucture for required updates and building action plan                                                                                                                                                                            osmosis.py:480
INFO     ...building project structure mapping in memory                                                                                                                                                                                                        osmosis.py:377
Traceback (most recent call last):
  File "/Users/pabloseibelt/dev/dwh-workflows/transform/dbt/src/venv/bin/dbt-osmosis", line 8, in <module>
    sys.exit(cli())
  File "/Users/pabloseibelt/dev/dwh-workflows/transform/dbt/src/venv/lib/python3.10/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/Users/pabloseibelt/dev/dwh-workflows/transform/dbt/src/venv/lib/python3.10/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/Users/pabloseibelt/dev/dwh-workflows/transform/dbt/src/venv/lib/python3.10/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/pabloseibelt/dev/dwh-workflows/transform/dbt/src/venv/lib/python3.10/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/pabloseibelt/dev/dwh-workflows/transform/dbt/src/venv/lib/python3.10/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/pabloseibelt/dev/dwh-workflows/transform/dbt/src/venv/lib/python3.10/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/Users/pabloseibelt/dev/dwh-workflows/transform/dbt/src/venv/lib/python3.10/site-packages/dbt_osmosis/main.py", line 81, in wrapper
    return func(*args, **kwargs)
  File "/Users/pabloseibelt/dev/dwh-workflows/transform/dbt/src/venv/lib/python3.10/site-packages/dbt_osmosis/main.py", line 143, in refactor
    if runner.commit_project_restructure_to_disk():
  File "/Users/pabloseibelt/dev/dwh-workflows/transform/dbt/src/venv/lib/python3.10/site-packages/dbt_osmosis/core/osmosis.py", line 510, in commit_project_restructure_to_disk
    blueprint = self.draft_project_structure_update_plan()
  File "/Users/pabloseibelt/dev/dwh-workflows/transform/dbt/src/venv/lib/python3.10/site-packages/dbt_osmosis/core/osmosis.py", line 486, in draft_project_structure_update_plan
    for unique_id, schema_file in self.build_schema_folder_mapping().items():
  File "/Users/pabloseibelt/dev/dwh-workflows/transform/dbt/src/venv/lib/python3.10/site-packages/dbt_osmosis/core/osmosis.py", line 385, in build_schema_folder_mapping
    current=schema_path.resolve(),
AttributeError: 'NoneType' object has no attribute 'resolve'

Config:

models:
  mydwh:
    +dbt-osmosis: "schema/{model}.yml"

dbt:

dbt --version
Core:
  - installed: 1.4.5
  - latest:    1.4.5 - Up to date!

Plugins:
  - snowflake: 1.4.2 - Up to date!

dbt-osmosis:

dbt-osmosis --version
dbt-osmosis, version 0.11.8

OS: Mac OS X Ventura 13.2.1

Thanks!!

Feature request - Add ability to "skip" adding all columns to models

The default feature of adding all columns to models is great but it can generate a huge diff in the YML files.

I think it would be useful to provide an option where documented columns are "pushed down" to downstream models but where non-documented columns are not added. I see it as a potentially first step in getting better docs without being to overwhelming..

[feat] Optimize Workbench for 2 Usage Patterns

Lets reduce scope in our workbench to a hyper tailored experience for
A) Creating a new model
B) Editing an existing model

There should (probably) be two separate but similar layouts for this.

Add metadata to connect source models to "staging" models to enable more robust inheritance

Given the following, when I run yaml refactor, the description from the source table addresses doesn't bubble up to the stg_some_schema__addresses staging model _stg_some_schema.yml (even though the description from the ID field does):

_this_db__some_schema__sources.yml:

version: 2

sources:
  - name: this_db__some_schema__source
    database: this_db
    schema: some_schema
    tables:
      - name: addresses
        description: 'this table has addresses'
        columns:
          - name: ID
            description: 'this is the ID'

_stg_some_schema.yml:

version: 2

models:
  - name: stg_some_source__addresses
    columns:
      - name: ID
        description: this is the ID

stg_some_schema__addresses.sql:

SELECT
	ID
,	NAME
,	ADDRESS_LINE1
,	ADDRESS_LINE2
,	CITY
,	DISTRICT_OR_COUNTY
,	STATE_CODE
,	COUNTRY_CODE
,	POSTAL_CODE
,	CREATED_AT
,	UPDATED_AT
,	_FIVETRAN_SYNCED

FROM
	{{ source("this_db__some_schema__source", "addresses") }}

I'm doing one .yml file per source, and one .yml file per corresponding staging schema

Python 3.10.10
dbt-core==1.4.5
dbt-extractor==0.4.1
dbt-osmosis==0.11.14
dbt-snowflake==1.4.2

Feature Request: Snowflake: Send queries in batches for significant performance gain

Related to #38

It seems like running yaml refactor sends the DESCRIBE TABLE ... statements to Snowflake sequentially one-by-one as it goes. It'd be great if this went a lot faster.

Seems like it also executes show terse objects in [db_name].[schema_name] every single time it runs describe table, when it seems like this could maybe just be executed one time upfront?

Describe the solution you'd like
Would it be reasonable to queue up all those database statements up front and run through them as the results return, so that it completes much more quickly.

Describe alternatives you've considered
Can't really think of any apart from just using it as it is and waiting much longer.

Additional context
python 3.9.16
macOS 13.2 (22D49)
Snowflake 7.3.1
dbt-core==1.3.2
dbt-extractor==0.4.1
dbt-osmosis==0.9.8
dbt-snowflake==1.3.0

dbt.contracts.graph.parsed.ColumnInfo is move to dbt.contracts.graph.nodes.ColumnInfo from dbt 1.4

Hi,
I tried to use dbt-osmosis with dbt 1.4 and got the following error.

hashiba@:~/work/dbt/jaffle_shop$ dbt-osmosis run --project-dir ~/work/dbt/jaffle_shop/ --target prod
Traceback (most recent call last):
  File "/Users/hashiba/.pyenv/versions/3.9.6/bin/dbt-osmosis", line 5, in <module>
    from dbt_osmosis.main import cli
  File "/Users/hashiba/.pyenv/versions/3.9.6/lib/python3.9/site-packages/dbt_osmosis/main.py", line 15, in <module>
    from dbt_osmosis.core.diff import diff_and_print_to_console
  File "/Users/hashiba/.pyenv/versions/3.9.6/lib/python3.9/site-packages/dbt_osmosis/core/diff.py", line 10, in <module>
    from dbt_osmosis.core.osmosis import DbtProject
  File "/Users/hashiba/.pyenv/versions/3.9.6/lib/python3.9/site-packages/dbt_osmosis/core/osmosis.py", line 49, in <module>
    from dbt.contracts.graph.parsed import ColumnInfo
ModuleNotFoundError: No module named 'dbt.contracts.graph.parsed'

It's because, dbt.contracts.graph.parsed.ColumnInfo is move to dbt.contracts.graph.nodes.ColumnInfo from dbt 1.4.

I hope the next version will support this.

AttributeError 'NoneType' object has no attribute 'current'

I'm having this error :

    if SCHEMA_FILE.current is None:
AttributeError: 'NoneType' object has no attribute 'current'

Looking at the code in main.py and app.py, I've notices many things.

Here is an extract of the content of the « project » variable (dict) :
{'name': 'MYPROJ', 'version': '1.0.1', 'project-root': 'my/path/MYPROJ/dbt', ....}

But in the scipt, to get the name and root path of the project it's done like that :
project.project_name instead of project.name
project.project_root instead of project.project-root

And SCHEMA_FILE is None because the proj variable, that should contain project.project_name => MYPROJ value, has : my/path/MYPROJ/dbt.
And node.get("package_name") = MYPROJ
So the test comparing "node.get("package_name") == proj" is always False so SCHEMA_FILE is logically None.

AttributeError: 'CompiledSqlNode' object has no attribute 'compiled_sql'

Hello,
I'm getting following error :
AttributeError: 'CompiledSqlNode' object has no attribute 'compiled_sql'
Traceback:
File "/home/dbt/venv_dbt/lib/python3.10/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 562, in _run_script
exec(code, module.dict)
File "/home/dbt/venv_dbt/lib/python3.10/site-packages/dbt_osmosis/app.py", line 327, in
if compile_sql(state[RAW_SQL]) != state[COMPILED_SQL]:
File "/home/dbt/venv_dbt/lib/python3.10/site-packages/dbt_osmosis/app.py", line 197, in compile_sql
return ctx.compile_sql(sql)
File "/home/dbt/venv_dbt/lib/python3.10/site-packages/dbt_osmosis/core/osmosis.py", line 367, in compile_sql
return self.get_compiled_node(sql).compiled_sql

any ide how to fix that ?

IPv6 issue

Good day,

When trying to run the server within docker, I'm getting the error

ERROR:    [Errno 99] error while attempting to bind on address ('::1', 8581, 0, 0): cannot assign requested address

I'm guessing this is because the program was written on IPv6 instead of IPv4. I was wondering if it would be possible to support IPv4 as well.

Thanks!

'DbtYamlManager' object has no attribute 'yaml', No module named 'dbt_osmosis'

Hi there--thanks for the great tool!

I'm getting an AttributeError when running dbt-osmosis compose and dbt-osmosis run. See the stacktrace below. My dbt_project.yml is attached, as well.

> dbt-osmosis run -f acdw.dimensions.acdw__dimensions__item 
INFO     🌊 Executing dbt-osmosis                                                                                                                                                                                                                                         main.py:78

INFO     📈 Searching project stucture for required updates and building action plan                                                                                                                                                                                  osmosis.py:907
INFO     ...building project structure mapping in memory                                                                                                                                                                                                              osmosis.py:885
INFO     [('CREATE', '->', WindowsPath('C:/my_project_path/models/acdw/dimensions/schema/acdw__dimensions__item.yml'))]                                                                                                                         osmosis.py:1027
INFO     👷 Executing action plan and conforming projecting schemas to defined structure                                                                                                                                                                              osmosis.py:977
INFO     🚧 Building schema file acdw__dimensions__item.yml                                                                                                                                                                                                           osmosis.py:983
Traceback (most recent call last):
  File "C:\Users\me\Anaconda3\envs\dbt\lib\runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Users\me\Anaconda3\envs\dbt\lib\runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "C:\Users\me\Anaconda3\envs\dbt\Scripts\dbt-osmosis.exe\__main__.py", line 7, in <module>
  File "C:\Users\me\Anaconda3\envs\dbt\lib\site-packages\click\core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "C:\Users\me\Anaconda3\envs\dbt\lib\site-packages\click\core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "C:\Users\me\Anaconda3\envs\dbt\lib\site-packages\click\core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "C:\Users\me\Anaconda3\envs\dbt\lib\site-packages\click\core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "C:\Users\me\Anaconda3\envs\dbt\lib\site-packages\click\core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "C:\Users\me\Anaconda3\envs\dbt\lib\site-packages\dbt_osmosis\main.py", line 89, in run
    if runner.commit_project_restructure_to_disk():
  File "C:\Users\me\Anaconda3\envs\dbt\lib\site-packages\dbt_osmosis\core\osmosis.py", line 987, in commit_project_restructure_to_disk
    self.yaml.dump(structure.output, target)
AttributeError: 'DbtYamlManager' object has no attribute 'yaml'

Possibly relatedly, I'm not able to run dbt-osmosis workbench, either. Apparently, dbt_osmosis can't find itself.

> dbt-osmosis workbench
INFO     🌊 Executing dbt-osmosis                                                                                                                                                                                                                                        main.py:406
[...]
2022-10-06 17:41:59.975 Pandas backend loaded 1.4.2
2022-10-06 17:41:59.981 Numpy backend loaded 1.21.5
2022-10-06 17:41:59.982 Pyspark backend NOT loaded
2022-10-06 17:41:59.983 Python backend loaded
2022-10-06 17:42:01.020 Uncaught app exception
Traceback (most recent call last):
  File "C:\Users\me\Anaconda3\lib\site-packages\streamlit\runtime\scriptrunner\script_runner.py", line 562, in _run_script
    exec(code, module.__dict__)
  File "C:\Users\me\Anaconda3\envs\dbt\Lib\site-packages\dbt_osmosis\app.py", line 17, in <module>
    from dbt_osmosis.core.osmosis import DEFAULT_PROFILES_DIR, DbtProject
ModuleNotFoundError: No module named 'dbt_osmosis'

dbt_project.zip

Support inheriting doc-blocks in description

Hi, thanks for writing this tool - it looks like just what I want :)

I found a possible improvement: when a description contains a doc-block, it would be nice if the inherited downstream descriptions re-used the reference instead of injecting the contents of the doc block.

Example:

models:
- name: first
  columns:
  - column_a: '{{ doc("column_a_description") }}'

will in a child_model named second be rendered into

models:
- name: second
  columns:
  - column_a: This text is available in the column_a_description.md file.\r\n\r\nWith a newline

If the reference is kept constant it will be very clear that the place to edit the description is someplace else.
Also, the current injection inserts a lot of newlines which makes the rendered yml ugly. Including the reference instead of the contents would solve both of these nuisances.

dbt-osmosis breaks sqlfluff linting

I have an virtual python environment with python version 3.10.6.

My requirements.txt looks as following.

dbt-core==1.3.2
dbt-postgres==1.3.2
sqlfluff==1.4.5
sqlfluff-templater-dbt==1.4.5
dbt-osmosis==0.9.8

When I now run a sqlfluff lint in that python environment I get below error.

=== [dbt templater] Sorting Nodes...
Traceback (most recent call last):
  File "/Users/frank.tubbing/source/repos/data-warehouse-etl-dbt/venv/bin/sqlfluff", line 8, in <module>
    sys.exit(cli())
  File "/Users/frank.tubbing/source/repos/data-warehouse-etl-dbt/venv/lib/python3.10/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/Users/frank.tubbing/source/repos/data-warehouse-etl-dbt/venv/lib/python3.10/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/Users/frank.tubbing/source/repos/data-warehouse-etl-dbt/venv/lib/python3.10/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/frank.tubbing/source/repos/data-warehouse-etl-dbt/venv/lib/python3.10/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/frank.tubbing/source/repos/data-warehouse-etl-dbt/venv/lib/python3.10/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/Users/frank.tubbing/source/repos/data-warehouse-etl-dbt/venv/lib/python3.10/site-packages/sqlfluff/cli/commands.py", line 582, in lint
    result = lnt.lint_paths(
  File "/Users/frank.tubbing/source/repos/data-warehouse-etl-dbt/venv/lib/python3.10/site-packages/sqlfluff/core/linter/linter.py", line 1164, in lint_paths
    for i, linted_file in enumerate(runner.run(expanded_paths, fix), start=1):
  File "/Users/frank.tubbing/source/repos/data-warehouse-etl-dbt/venv/lib/python3.10/site-packages/sqlfluff/core/linter/runner.py", line 106, in run
    for fname, partial in self.iter_partials(fnames, fix=fix):
  File "/Users/frank.tubbing/source/repos/data-warehouse-etl-dbt/venv/lib/python3.10/site-packages/sqlfluff/core/linter/runner.py", line 59, in iter_partials
    for fname, rendered in self.iter_rendered(fnames):
  File "/Users/frank.tubbing/source/repos/data-warehouse-etl-dbt/venv/lib/python3.10/site-packages/sqlfluff/core/linter/runner.py", line 42, in iter_rendered
    for fname in self.linter.templater.sequence_files(
  File "/Users/frank.tubbing/source/repos/data-warehouse-etl-dbt/venv/lib/python3.10/site-packages/sqlfluff_templater_dbt/templater.py", line 280, in sequence_files
    for key, node in self.dbt_manifest.nodes.items():
  File "/Users/frank.tubbing/.asdf/installs/python/3.10.6/lib/python3.10/functools.py", line 981, in __get__
    val = self.func(instance)
  File "/Users/frank.tubbing/source/repos/data-warehouse-etl-dbt/venv/lib/python3.10/site-packages/sqlfluff_templater_dbt/templater.py", line 154, in dbt_manifest
    self.dbt_manifest = ManifestLoader.get_full_manifest(self.dbt_config)
  File "/Users/frank.tubbing/source/repos/data-warehouse-etl-dbt/venv/lib/python3.10/site-packages/dbt/parser/manifest.py", line 206, in get_full_manifest
    adapter = get_adapter(config)  # type: ignore
  File "/Users/frank.tubbing/source/repos/data-warehouse-etl-dbt/venv/lib/python3.10/site-packages/dbt_osmosis/core/osmosis.py", line 12, in <lambda>
    dbt.adapters.factory.get_adapter = lambda config: config.adapter
AttributeError: 'RuntimeConfig' object has no attribute 'adapter'

While if I deinstall dbt-osmosis in that virtual environment sqlfluff works as it should. Could someone please take a look why this breaks the code?

Is this project dead?

I know there's been a lot of changes in the SQLFluff, DBT, and VSCode world that has had an effect on dbt-osmosis's compatibility. Are we planning to update the project or should we assume it's not getting forward compatibility?

dbt-osmosis refactor not working with models that are dependant on seeds

Hello,

so as stated in the issue title, I am having issues when doing .yaml refactoring for dbt models that reference dbt Seeds. The error I am receiving:
Error occurred while processing model model.test_project.test_model: 'SeedNode' object has no attribute 'depends_on'

dbt version: 1.4.4

Installed packages:

package: calogica/dbt_expectations
version: [">=0.8.0", "<0.9.0"]
package: calogica/dbt_date
version: [">=0.7.0", "<0.8.0"]
package: Snowflake-Labs/dbt_constraints
version: [">=0.5.0", "<0.6.0"]
package: dbt-labs/segment
version: [">=0.9.0", "<1.0.0"]
package: dbt-labs/codegen
version: [">=0.9.0", "<1.0.0"]
package: brooklyn-data/dbt_artifacts
version: [">=2.2.0", "<2.3.0"]
package: dbt-labs/dbt_external_tables
version: [">=0.8.0", "<1.0.0"]

Let me know if there's anything I can help you with and keep up the good work! Thanks.

AttributeError: 'DbtOsmosis'

When a user runs dbt-osmosis diff -m some_model ... they are getting AttributeError: 'DbtOsmosis':

dbt-osmosis diff -m some_model --project-dir /path/to/dbt/transformations --profiles-dir /path/to/dbt/transformations/profiles --target target0

INFO     🌊 Executing dbt-osmosis
                                                                                                                                                                                                             
INFO     Injecting macros, please wait...                                                                                                                                                         
Traceback (most recent call last):
  File "/Users/myuser/.pyenv/versions/3.10.7/bin/dbt-osmosis", line 8, in <module>
    sys.exit(cli())
.....
AttributeError: 'DbtOsmosis' object has no attribute '_get_exec_node'. Did you mean: 'get_ref_node'?

Note

Please note that no error when running the dbt-osmosis workbench - that one runs nicely.

Configuration:

dbt version: 1.2.0
python version: Python 3.10.7

`compiled_sql` renamed to `compiled_code` from dbt 1.3 onwards

Hi! I just wanted to mention that from dbt version 1.3 onwards, the field compiled_sql from manifest.json has been renamed to compiled_code (to be relevant for Python models as well).

I tried the integration of dbt-osmosis into dbt PowerUser on a repo with 1.3b and got errors when I wanted to preview the data.

Happy to create a PR to cater for compiled_code if you want me to!

dbt-osmosis Public streamlit app is broken

The sample streamlit app linked in the README returns an error.

Broken link: https://z3z1ma-dbt-osmosis-srcdbt-osmosisapp-4y67qs.streamlit.app/

Screenshot:

Cannot run `dbt-osmosis==0.10.3` on `dbt-core>=1.3,<1.4`"

I created and activated a fresh virtual environment with dbt-osmosis==0.10.3 and dbt-core>=1.3,<1.4(currently resolves to dbt-core==1.3.3) with the following command:

pyenv virtualenv 3.9.7 dbt-osmosis-test && pyenv activate $_ && pip install "dbt-osmosis==0.10.3" "dbt-core>=1.3,<1.4"

However, upon running various dbt-osmosis command (e.g. dbt-osmosis --version, dbt-osmosis --help), I get the following error:

dbt-osmosis --help
Traceback (most recent call last):
  File "/Users/waligob/.pyenv/versions/dbt-osmosis-test/bin/dbt-osmosis", line 5, in <module>
    from dbt_osmosis.main import cli
  File "/Users/waligob/.pyenv/versions/3.9.7/envs/dbt-osmosis-test/lib/python3.9/site-packages/dbt_osmosis/main.py", line 17, in <module>
    from dbt_osmosis.core.macros import inject_macros
  File "/Users/waligob/.pyenv/versions/3.9.7/envs/dbt-osmosis-test/lib/python3.9/site-packages/dbt_osmosis/core/macros.py", line 4, in <module>
    from dbt_osmosis.core.osmosis import DbtProject
  File "/Users/waligob/.pyenv/versions/3.9.7/envs/dbt-osmosis-test/lib/python3.9/site-packages/dbt_osmosis/core/osmosis.py", line 22, in <module>
    from dbt.contracts.graph.parsed import ColumnInfo, ManifestNode  # type: ignore
ImportError: cannot import name 'ManifestNode' from 'dbt.contracts.graph.parsed' (/Users/waligob/.pyenv/versions/3.9.7/envs/dbt-osmosis-test/lib/python3.9/site-packages/dbt/contracts/graph/parsed.py)

Error: `TypeError: cannot pickle '_thread.RLock' object`

Loving how much this project is charging ahead!

Was forced to be away from it for a while due to some fires to put out. Anxious to explore improvements and new features.

One thing I've always encountered (including on version 0.11.14) is getting:
TypeError: cannot pickle '_thread.RLock' object
randomly when trying to execute yaml refactor.

Traceback:

dbt-osmosis yaml refactor --fqn staging.[MY_FOLDER_NAME_REDACTED]
INFO     🌊 Executing dbt-osmosis                                                                                                                                 main.py:141

Traceback (most recent call last):
  File "[MY_FILE_PATH_REDACTED]/dbt_oc__venv/bin/dbt-osmosis", line 8, in <module>
    sys.exit(cli())
  File "[MY_FILE_PATH_REDACTED]/dbt_oc__venv/lib/python3.10/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "[MY_FILE_PATH_REDACTED]/dbt_oc__venv/lib/python3.10/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "[MY_FILE_PATH_REDACTED]/dbt_oc__venv/lib/python3.10/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "[MY_FILE_PATH_REDACTED]/dbt_oc__venv/lib/python3.10/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "[MY_FILE_PATH_REDACTED]/dbt_oc__venv/lib/python3.10/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "[MY_FILE_PATH_REDACTED]/dbt_oc__venv/lib/python3.10/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "[MY_FILE_PATH_REDACTED]/dbt_oc__venv/lib/python3.10/site-packages/dbt_osmosis/main.py", line 81, in wrapper
    return func(*args, **kwargs)
  File "[MY_FILE_PATH_REDACTED]/dbt_oc__venv/lib/python3.10/site-packages/dbt_osmosis/main.py", line 143, in refactor
    runner = DbtYamlManager(
  File "[MY_FILE_PATH_REDACTED]/dbt_oc__venv/lib/python3.10/site-packages/dbt_osmosis/core/osmosis.py", line 119, in __init__
    super().__init__(  # partial parse messes up our f strings, so force a full parse on init
  File "[MY_FILE_PATH_REDACTED]/dbt_oc__venv/lib/python3.10/site-packages/dbt_osmosis/vendored/dbt_core_interface/project.py", line 409, in __init__
    self.parse_project(init=True)
  File "[MY_FILE_PATH_REDACTED]/dbt_oc__venv/lib/python3.10/site-packages/dbt_osmosis/vendored/dbt_core_interface/project.py", line 493, in parse_project
    self.manifest = _project_parser.load()
  File "[MY_FILE_PATH_REDACTED]/dbt_oc__venv/lib/python3.10/site-packages/dbt/parser/manifest.py", line 351, in load
    self.parse_project(
  File "[MY_FILE_PATH_REDACTED]/dbt_oc__venv/lib/python3.10/site-packages/dbt/parser/manifest.py", line 479, in parse_project
    parser.parse_file(block)
  File "[MY_FILE_PATH_REDACTED]/dbt_oc__venv/lib/python3.10/site-packages/dbt/parser/base.py", line 414, in parse_file
    self.parse_node(file_block)
  File "[MY_FILE_PATH_REDACTED]/dbt_oc__venv/lib/python3.10/site-packages/dbt/parser/base.py", line 388, in parse_node
    self.render_update(node, config)
  File "[MY_FILE_PATH_REDACTED]/dbt_oc__venv/lib/python3.10/site-packages/dbt/parser/models.py", line 306, in render_update
    model_parser_copy = self.partial_deepcopy()
  File "[MY_FILE_PATH_REDACTED]/dbt_oc__venv/lib/python3.10/site-packages/dbt/parser/models.py", line 461, in partial_deepcopy
    return ModelParser(deepcopy(self.project), self.manifest, deepcopy(self.root_project))
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/copy.py", line 172, in deepcopy
    y = _reconstruct(x, memo, *rv)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/copy.py", line 271, in _reconstruct
    state = deepcopy(state, memo)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/copy.py", line 146, in deepcopy
    y = copier(x, memo)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/copy.py", line 231, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/copy.py", line 172, in deepcopy
    y = _reconstruct(x, memo, *rv)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/copy.py", line 271, in _reconstruct
    state = deepcopy(state, memo)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/copy.py", line 146, in deepcopy
    y = copier(x, memo)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/copy.py", line 231, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/copy.py", line 172, in deepcopy
    y = _reconstruct(x, memo, *rv)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/copy.py", line 271, in _reconstruct
    state = deepcopy(state, memo)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/copy.py", line 146, in deepcopy
    y = copier(x, memo)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/copy.py", line 231, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/copy.py", line 161, in deepcopy
    rv = reductor(4)
TypeError: cannot pickle '_thread.RLock' object

I usually just have to keep trying to execute and eventually it'll work. Sometimes I'll quit my terminal app and force quit Python from the Activity Monitor and this seems to work pretty reliably.

[feat] Include SQL Fluff Actions

Lets add controls to interactively lint, auto fix the SQL our users are authoring in the workbench.
We want to continue to peel away the layers of separation between value creation and ensuring quality of work.

Using dbt query_comment breaks project registration

Symptoms:
My dbt installation uses the query-comment config to add labels to BigQuery queries. In our dbt_project.yml we have

query-comment:
  comment: "{{ query_comment(node) }}"
  job-label: true

When trying to run a dbt-osmosis server serve...., the server throws a 500 Could not connect to Database error.

Diagnosis:
Trailing things through the code, I reached parse_project in osmosis.py. This inits the DB adapter, loads the manifest, and importantly calls save_macros_to_adapter. BUT... before we get to the save macros, the adapter setter fn calls _verify_connection which tries to query the DB before the adapter is fully initialised...

Now the adapter errors in this chunk in the dbt-bigquery connection

if (
            hasattr(self.profile, "query_comment")
            and self.profile.query_comment
            and self.profile.query_comment.job_label
        ):
            query_comment = self.query_header.comment.query_comment. 
            labels = self._labels_from_query_comment(query_comment)

failing with 'NoneType' object has no attribute 'comment' on line 6 of the snippet.
Now this code looks ugly - it's checking the contents of self.profile, then trying to access self.query-header - clearly a massive assumption is made here...

So - we have a conflict. dbt-osmosis is trying to run a query without ensuring the adapter is completely configured - and dbt-bigquery is doing a really wonky test. I'm happy to PR a deferred verify process in dbt-osmosis, or if you consider this to be more of a dbt-bigquery bug, I'll raise it with them.

Can't run `workbench` - No souch file or directory: `streamlit`

Hi,
I installed the dbt-osmosis and tried to execute the dbt-osmosis workbench command
but I get error that No such file or directory: 'streamlit'

here is the full error msg:

INFO     🌊 Executing dbt-osmosis                                                            main.py:390
                                                                                                        
Traceback (most recent call last):
  File "/Users/galpolak/Library/Python/3.9/bin/dbt-osmosis", line 8, in <module>
    sys.exit(cli())
  File "/usr/local/lib/python3.9/site-packages/click/core.py", line 1128, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/click/core.py", line 1053, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.9/site-packages/click/core.py", line 1659, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.9/site-packages/click/core.py", line 1395, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.9/site-packages/click/core.py", line 754, in invoke
    return __callback(*args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/click/decorators.py", line 26, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/Users/galpolak/Library/Python/3.9/lib/python/site-packages/dbt_osmosis/main.py", line 414, in workbench
    subprocess.run(
  File "/usr/local/Cellar/[email protected]/3.9.13_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/subprocess.py", line 505, in run
    with Popen(*popenargs, **kwargs) as process:
  File "/usr/local/Cellar/[email protected]/3.9.13_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/subprocess.py", line 951, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
  File "/usr/local/Cellar/[email protected]/3.9.13_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/subprocess.py", line 1821, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'streamlit'

Bug: Can't run with dbt-core > 1.3.2 - errors immediately after install

This repo looks awesome! Exactly what I've been looking for, and indeed makes you wonder why it isn't in dbt-core.

I'd absolutely love to use it, but after running pip-install dbt-osmosis in a fresh virtual environment with a fresh (working) clone of the repo, I get the following when I try to execute dbt-osmosis yaml refactor:

Traceback (most recent call last):
  File "/Users/[my_user_name]/dbt_osmosis__venv/bin/dbt-osmosis", line 5, in <module>
    from dbt_osmosis.main import cli
  File "/Users/jaredchristensen/dbt_osmosis__venv/lib/python3.9/site-packages/dbt_osmosis/main.py", line 15, in <module>
    from dbt_osmosis.core.diff import diff_and_print_to_console
  File "/Users/jaredchristensen/dbt_osmosis__venv/lib/python3.9/site-packages/dbt_osmosis/core/diff.py", line 10, in <module>
    from dbt_osmosis.core.osmosis import DbtProject
  File "/Users/jaredchristensen/dbt_osmosis__venv/lib/python3.9/site-packages/dbt_osmosis/core/osmosis.py", line 49, in <module>
    from dbt.contracts.graph.parsed import ColumnInfo
ModuleNotFoundError: No module named 'dbt.contracts.graph.parsed'

I first ran dbt docs generate just in case it needed the manifest or some other dbt-generated files.

Is it just that dbt-osmosis can only work with older versions of dbt?

Python 3.9.16 (also tried with Python 3.10.9 with the same versions as below)

dbt-core==1.4.1
dbt-extractor==0.4.1
dbt-osmosis==0.9.8
dbt-snowflake==1.4.0

macOS 13.1 (22C65)
Snowflake 7.3.0

[feat] case insensitivity

In our environment people write everything in lower case including both the SQL and the YML. It looks like the SNOWFLAKE adapter consistently returns column names as upper case. Can you provide an option to ignore case in comparisons of column names between the YML version and the DB?

I "fixed" this locally by changing this line to read c.name.lower() in the obvious place:

dbt-osmosis/src/dbt_osmosis/main.py

Line 438 in 05ed698

columns = [c.name for c in adapter.get_columns_in_relation(table)]

But you might instead want to keep that case and modify all the places where you compare columns to do a case insensitive comparison. Looks like you're using python sets for the different column lists. I don't know how you'd make them case insensitive, so just adding the lower() maybe as a configuration would be great.

[feat] Allow pass project and profile from CLI to workbench entrypoint

We can likely forgo the initial page asking for project and profile and resolve it via injecting env vars at runtime when instantiating the streamlit subprocess allowing us to use CLI args

Feature request - Support offline ingestion of `manifest.json` and `catalog.json`

I think that the ability to use existing manifest.json and catalog.json would be a great way to run the yaml refactor "offline", without requiring access to the Warehouse.

In the case of large projects, I am actually wondering if this wouldn't speed up the yaml refactoring process significantly (instead of querying the columns of each model one by one)

I see that it could be especially useful in the case someone would want to have dbt-osmosis as a CI/CD step without wanting to store their Warehouse credentials in GitHub.

In the case of dbt Cloud, for example, the artifacts could be downloaded from the API, and then:

catalog.json would provide the list of all columns and their description for all models/sources
manifest.json would provide details of the models used in {node}

I could potentially give this feature a go if you think it would be valuable

[fix] extra newlines

Maybe just on Windows: yml files with comments "#" in the original file end up getting an extry newline after each comment. For example, typically people comment out tests like this:

  - name: my_column
    description: exactly what you want it to mean
      #        tests:
      #          - not_null

  - name: next_column

But this comes out the other side of a dbt-osmosis run as:

  - name: my_column
    description: exactly what you want it to mean
      #        tests:

      #          - not_null


  - name: next_column

Note that this is pretty horrible because if you run it multiple times, then each run that changes the file will introduce a new blank line for every comment and the file will just explode.

Lint returns different results than sqlfluff cli

Given this file

with population_rank as (
    select
        country_code,
        country_name,
        value,
        s.year,
        rank() over (
            partition by country_code, country_name order by year desc
        ) as rank_years
    from {{ ref('_airbyte_raw_country_populations') }} as s
    group by 1, 2, 3
)

select
    country_code,
    country_name,
    value,
    year
from population_rank
where
    rank_years = 1
    and year > 2017

using sqlfluff 2.0.0a4
with this .sqlfluff config

[sqlfluff]
verbose = 0
nocolor = False

# Comma separated list of file extensions to lint.
# NB: This config will only apply in the root folder.
sql_file_exts = .sql,.sql.j2,.dml,.ddl

### OPINIONATED DEFAULTS ADJUST AS NECESSARY

dialect = snowflake
templater = dbt

recurse = 0
output_line_length = 80
max_line_length = 14

runaway_limit = 10
ignore = parsing,templating
ignore_templated_areas = True

# Suggested core rules as defined by the SQLFluff team
# L001 - L006
# L008 - L010
# L012 - L026
# L028
# L030
# L033
# L038 - L041
# L045 - L050
# L052
# L054

# Uncomment and adjust as needed
# rules = L001, L002, L003, L004, L005, L006, L007, L008, L009, L010, L011, L012, L013, L014, L015, L016, L017, L018, L019, L020, L021, L022, L023, L024, L025, L026, L027, L028, L029, L030, L032, L033, L036, L037, L038, L039, L040, L041, L042, L043, L044, L046, L047, L048, L049, L050, L051, L053, L054, L055, L058, L060, L061, L063, L064, L065

# Excluded rules
exclude_rules = L031, L034, L035, L051, L052, L054, L056, L057, L059, L062, L045

[sqlfluff:indentation]
tab_space_size = 4
indent_unit = space
indented_joins = False
indented_ctes = true
template_blocks_indent = True

[sqlfluff:templater]
unwrap_wrapped_queries = True

[sqlfluff:templater:dbt]
project_dir = ./

# Some rules can be configured directly from the config common to other rules.
[sqlfluff:rules]
layout:type:comma:line_position = trailing
allow_scalar = True
single_table_references = unqualified
unquoted_identifiers_policy = all



# Some rules have their own specific config.

[sqlfluff:rules:L007]  # Operator before/after new line
layout:type:binary_operator:line_position = leading

[sqlfluff:rules:L010]  # Keywords
capitalisation_policy = lower

[sqlfluff:rules:L011]
aliasing = explicit

[sqlfluff:rules:L012]
aliasing = explicit

[sqlfluff:rules:L013]
allow_scalar = True

[sqlfluff:rules:L014]
extended_capitalisation_policy = lower

[sqlfluff:rules:L016]
ignore_comment_clauses = True
ignore_comment_lines = True
indent_unit = space

[sqlfluff:rules:L019]
ayout:type:comma:line_position = trailing

[sqlfluff:rules:L026]
force_enable = True

[sqlfluff:rules:L028]
force_enable = True
single_table_references = unqualified

[sqlfluff:rules:L029]  # Keyword identifiers
unquoted_identifiers_policy = none

[sqlfluff:rules:L030]  # Function names
capitalisation_policy = lower

[sqlfluff:rules:L038]
select_clause_trailing_comma = forbid

[sqlfluff:rules:L040]  # Null & Boolean Literals
capitalisation_policy = lower

[sqlfluff:rules:L042]
# By default, allow subqueries in from clauses, but not join clauses.

forbid_subquery_in = both

[sqlfluff:rules:L047]  # Consistent syntax to count all rows
prefer_count_0 = False
prefer_count_1 = False

[sqlfluff:rules:L051]
fully_qualify_join_types = both

[sqlfluff:rules:L054]
group_by_and_order_by_style = explicit

[sqlfluff:rules:L063]
extended_capitalisation_policy = lower

[sqlfluff:rules:L064]
preferred_quoted_literal_style = single_quotes

[sqlfluff:rules:L066]
min_alias_length = 7

running this on the command line
/config/.local/bin/sqlfluff lint --format json --config /config/workspace/transform/.sqlfluff transform/models/L2_bays/country_demographics/current_population.sql

outputs

22:23:30  Unable to do partial parsing because config vars, config profile, or config target have changed
22:23:34  [WARNING]: Did not find matching node for patch with name '_AIRBYTE_RAW_ZIP_COORDINATES' in the 'models' section of file 'models/L1_inlets/loans/_airbyte_raw_zip_coordinates.yml'
[{"filepath": "transform/models/L2_bays/country_demographics/current_population.sql", "violations": [{"line_no": 1, "line_pos": 1, "code": "L016", "description": "Line is too long (25 > 14)."}, {"line_no": 3, "line_pos": 9, "code": "L016", "description": "Line is too long (21 > 14)."}, {"line_no": 4, "line_pos": 9, "code": "L016", "description": "Line is too long (21 > 14)."}, {"line_no": 6, "line_pos": 9, "code": "L016", "description": "Line is too long (15 > 14)."}, {"line_no": 6, "line_pos": 9, "code": "L028", "description": "Qualified reference 's.year' found in single table select."}, {"line_no": 7, "line_pos": 9, "code": "L016", "description": "Line is too long (21 > 14)."}, {"line_no": 8, "line_pos": 13, "code": "L016", "description": "Line is too long (70 > 14)."}, {"line_no": 9, "line_pos": 9, "code": "L016", "description": "Line is too long (23 > 14)."}, {"line_no": 10, "line_pos": 5, "code": "L016", "description": "Line is too long (59 > 14)."}, {"line_no": 10, "line_pos": 59, "code": "L066", "description": "Aliases should be at least 7 character(s) long."}, {"line_no": 11, "line_pos": 5, "code": "L016", "description": "Line is too long (20 > 14)."}, {"line_no": 15, "line_pos": 5, "code": "L016", "description": "Line is too long (17 > 14)."}, {"line_no": 16, "line_pos": 5, "code": "L016", "description": "Line is too long (17 > 14)."}, {"line_no": 19, "line_pos": 1, "code": "L016", "description": "Line is too long (20 > 14)."}, {"line_no": 21, "line_pos": 5, "code": "L016", "description": "Line is too long (18 > 14)."}, {"line_no": 22, "line_pos": 5, "code": "L016", "description": "Line is too long (19 > 14)."}]}]

whereas, running this
curl -X POST localhost:8581/lint?sql_path=/config/workspace/transform/models/L2_bays/country_demographics/current_population.sql&extra_config_path=/config/workspace/transform/.sqlfluff

returns this

{"result":[{"code":"L028","description":"Qualified reference 's.year' found in single table select.","line_no":6,"line_pos":9},{"code":"L066","description":"Aliases should be at least 7 character(s) long.","line_no":10,"line_pos":59}]}

InvalidConnectionException

I keep getting the following error when trying to hit the "Test compiled query"

Any ideas what causes it?

Using DBT 1.3.0 with the dbt-glue adapter.

InvalidConnectionException: Runtime Error connection never acquired for thread (94394, 10798264320), have [(94394, 6157971456)]

File "/Users/USER1/Library/Caches/pypoetry/virtualenvs/scripts-KQs5TViC-py3.10/lib/python3.10/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 556, in _run_script
exec(code, module.dict)
File "/Users/USER1/Library/Caches/pypoetry/virtualenvs/scripts-KQs5TViC-py3.10/lib/python3.10/site-packages/dbt_osmosis/app.py", line 328, in
if compile_sql(state[RAW_SQL]) != state[COMPILED_SQL]:
File "/Users/USER1/Library/Caches/pypoetry/virtualenvs/scripts-KQs5TViC-py3.10/lib/python3.10/site-packages/dbt_osmosis/app.py", line 198, in compile_sql
return ctx.compile_sql(sql).compiled_sql
File "/Users/USER1/Library/Caches/pypoetry/virtualenvs/scripts-KQs5TViC-py3.10/lib/python3.10/site-packages/dbt_osmosis/core/osmosis.py", line 543, in compile_sql
return self.compile_sql(raw_sql, retry - 1)
File "/Users/USER1/Library/Caches/pypoetry/virtualenvs/scripts-KQs5TViC-py3.10/lib/python3.10/site-packages/dbt_osmosis/core/osmosis.py", line 543, in compile_sql
return self.compile_sql(raw_sql, retry - 1)
File "/Users/USER1/Library/Caches/pypoetry/virtualenvs/scripts-KQs5TViC-py3.10/lib/python3.10/site-packages/dbt_osmosis/core/osmosis.py", line 543, in compile_sql
return self.compile_sql(raw_sql, retry - 1)
File "/Users/USER1/Library/Caches/pypoetry/virtualenvs/scripts-KQs5TViC-py3.10/lib/python3.10/site-packages/dbt_osmosis/core/osmosis.py", line 544, in compile_sql
raise exc
File "/Users/USER1/Library/Caches/pypoetry/virtualenvs/scripts-KQs5TViC-py3.10/lib/python3.10/site-packages/dbt_osmosis/core/osmosis.py", line 540, in compile_sql
node = self.compile_node(self.get_server_node(raw_sql, temp_node_id))
File "/Users/USER1/Library/Caches/pypoetry/virtualenvs/scripts-KQs5TViC-py3.10/lib/python3.10/site-packages/dbt_osmosis/core/osmosis.py", line 554, in compile_node
compiled_node = self.sql_compiler.compile(self.dbt)
File "/Users/USER1/Library/Caches/pypoetry/virtualenvs/scripts-KQs5TViC-py3.10/lib/python3.10/site-packages/dbt/task/sql.py", line 39, in compile
return compiler.compile_node(self.node, manifest, {}, write=False)
File "/Users/USER1/Library/Caches/pypoetry/virtualenvs/scripts-KQs5TViC-py3.10/lib/python3.10/site-packages/dbt/compilation.py", line 535, in compile_node
node = self._compile_node(node, manifest, extra_context)
File "/Users/USER1/Library/Caches/pypoetry/virtualenvs/scripts-KQs5TViC-py3.10/lib/python3.10/site-packages/dbt/compilation.py", line 394, in _compile_node
compiled_node.compiled_code = jinja.get_rendered(
File "/Users/USER1/Library/Caches/pypoetry/virtualenvs/scripts-KQs5TViC-py3.10/lib/python3.10/site-packages/dbt/clients/jinja.py", line 587, in get_rendered
return render_template(template, ctx, node)
File "/Users/USER1/Library/Caches/pypoetry/virtualenvs/scripts-KQs5TViC-py3.10/lib/python3.10/site-packages/dbt/clients/jinja.py", line 542, in render_template
return template.render(ctx)
File "/Users/USER1/Library/Caches/pypoetry/virtualenvs/scripts-KQs5TViC-py3.10/lib/python3.10/site-packages/jinja2/environment.py", line 1301, in render
self.environment.handle_exception()
File "/Users/USER1/Library/Caches/pypoetry/virtualenvs/scripts-KQs5TViC-py3.10/lib/python3.10/site-packages/jinja2/environment.py", line 936, in handle_exception
raise rewrite_traceback_stack(source=source)
File "", line 11, in top-level template code
File "/Users/USER1/Library/Caches/pypoetry/virtualenvs/scripts-KQs5TViC-py3.10/lib/python3.10/site-packages/jinja2/sandbox.py", line 393, in call
return __context.call(__obj, *args, **kwargs)
File "/Users/USER1/Library/Caches/pypoetry/virtualenvs/scripts-KQs5TViC-py3.10/lib/python3.10/site-packages/dbt/clients/jinja.py", line 326, in call
return self.call_macro(*args, **kwargs)
File "/Users/USER1/Library/Caches/pypoetry/virtualenvs/scripts-KQs5TViC-py3.10/lib/python3.10/site-packages/dbt/clients/jinja.py", line 253, in call_macro
return macro(*args, **kwargs)
File "/Users/USER1/Library/Caches/pypoetry/virtualenvs/scripts-KQs5TViC-py3.10/lib/python3.10/site-packages/jinja2/runtime.py", line 777, in _invoke
rv = self._func(*arguments)
File "", line 6, in template
File "/Users/USER1/Library/Caches/pypoetry/virtualenvs/scripts-KQs5TViC-py3.10/lib/python3.10/site-packages/jinja2/sandbox.py", line 393, in call
return __context.call(__obj, *args, **kwargs)
File "/Users/USER1/Library/Caches/pypoetry/virtualenvs/scripts-KQs5TViC-py3.10/lib/python3.10/site-packages/dbt/adapters/glue/impl.py", line 156, in get_relation
session, client, cursor = self.get_connection()
File "/Users/USER1/Library/Caches/pypoetry/virtualenvs/scripts-KQs5TViC-py3.10/lib/python3.10/site-packages/dbt/adapters/glue/impl.py", line 72, in get_connection
connection: GlueConnectionManager = self.connections.get_thread_connection()
File "/Users/USER1/Library/Caches/pypoetry/virtualenvs/scripts-KQs5TViC-py3.10/lib/python3.10/site-packages/dbt/adapters/base/connections.py", line 91, in get_thread_connection
raise dbt.exceptions.InvalidConnectionException(key, list(self.thread_connections))

dbt.exceptions.DbtProfileError error while running dbt-osmosis

Hello,

I'm facing an issue while running dbt-osmosis with those commands :

dbt-osmosis run --project-dir ~/my/path/to/dbt --target sbx --profiles-dir ~/my/path/to/dbt

dbt-osmosis workbench

dbt is well configured and running for several years now but I'm having an error :

dbt.exceptions.DbtProfileError: Runtime Error
  Credentials in profile "MYPROFIL", target "sbx" invalid: Runtime Error
    Could not find adapter type bigquery!

Here is my dbt --version command result :

installed version: 1.0.1
   latest version: 1.0.4

Your version of dbt is out of date! You can find instructions for upgrading here:
https://docs.getdbt.com/docs/installation

Plugins:
  - postgres: 0.21.0
  - redshift: 0.21.0
  - bigquery: 1.0.0
  - snowflake: 0.21.0

Do you know where this might come from, please ?

Edit :

I found that even if I define the path in the UI, profiles_dir is never set, it always takes the dbt.config.profile.DEFAULT_PROFILES_DIR, so I created a profiles.yaml file in the DEFAULT_PROFILES_DIR (default : ~/.dbt), it seems like this solved the issue but that's weird that profiles_dir is always None, have I missed a step ?

Error starting workbench

Encountering below page output (and on CLI as well) when executing dbt-osmosis workbench -m unique_model_name. Seems it's getting a none value for a file path, but I'm not sure where or how exactly. This is my first run of dbt-osmosis workbench.

AttributeError: 'NoneType' object has no attribute 'patch_path'
Traceback:
File "C:\Users\myusername\AppData\Local\Programs\Python\Python310\lib\site-packages\streamlit\scriptrunner\script_runner.py", line 554, in _run_script
    exec(code, module.__dict__)
File "C:\Users\myusername\AppData\Local\Programs\Python\Python310\lib\site-packages\dbt_osmosis\app.py", line 345, in <module>
    Path(ctx.project_root) / ctx.get_patch_path(st.session_state[BASE_NODE]),
File "C:\Users\myusername\AppData\Local\Programs\Python\Python310\lib\site-packages\dbt_osmosis\core\osmosis.py", line 179, in get_patch_path
    return Path(node.patch_path.split(FILE_ADAPTER_POSTFIX)[-1])

Misalignment between GitHub and PyPi releases?

As per subject line, I'm trying to understand why the latest release on GitHub is 0.10.3 however, on PyPi it's 0.10.8 at time of writing.

It seems like this should be aligned but maybe there is something I am not taking into consideration...

Performance question: over an hour for `dbt-osmosis document` ?

Running dbt-osmosis document took over an hour, and eventually froze iTerm2 (I'd guess due to memory issues?)

In terms of the size of our Snowflake instance, I think I'd say we're small to medium.

Is this expected? I have a total of 2051 models

dbt-osmosis yaml refactor is removing nested values

Hi,

I have some arrays in my BigQuery source table that are documented in my source.yml file as columns with dot notation.
When I run dbt-osmosis yaml refactor, these fields are removed unfortunately.

Example yaml:

columns:
          - name: images
            description: "Array with image urls and caption"
          - name: images.url
            description: "The url of the image"
          - name: images.caption
            description: "The caption of the image"

Output from dbt-osmosis yaml refactor:

INFO     🔧 Removing column images.caption from dbt schema                                                                                                                                                                                                    
INFO     🔧 Removing column images.url from dbt schema

Cheers
Andreas

Different models with same alias causing issues

dbt version 1.3.0
dbt-osmosis version 0.9.7

When running dbt-osmosis yaml refactor on my project it results in duplicates, i.e. the same model in multiple yaml files.
This seems be caused by having multiple models with different names, but the same alias (in different schemas).
The yml files are named correctly, i.e. same as the model, but the model name inside the schema.yml is the alias not the model name. This then leads to failures.

VS Code - Osmosis Execute DBT SQL feature failing

Hello and thanks for looking at my issue. When I click on this button after opening a model sql file in VS Code,

I get this error.

The actual path is c:\\Temp\git\au-nvg-business-intelligence\\dbt\navigate_bi\\dbt_project.yml. Because there are single backslashes in 2 spots (\au-nvg-xxx and \navigate_bi\xxxx), both are being translated to \a and \n. I don't know why a mixture of single and double backslashes are coming in the file name and path.

I've had to put a temp fix in osmosis.py in line 181 to get this working. See project_dir being added with replace calls.

args = PseudoArgs(
            threads=threads,
            target=target,
            profiles_dir=profiles_dir,
            #project_dir=project_dir,
            project_dir=project_dir.replace('\t', '\\t').replace('\n', '\\n').replace('\r', '\\r').replace('\a', '\\a'),
        )