Giter VIP home page Giter VIP logo

dbt-artifacts-parser's Introduction

Test python Package version Supported Python versions

dbt-artifacts-parser

This is a dbt artifacts parse in python. It enables us to deal with catalog.json, manifest.json, run-results.json and sources.json as python objects.

Supported Versions and Compatibility

โš ๏ธ Important Note:

  • Pydantic v1 will not be supported for dbt 1.9 or later.
  • To parse dbt 1.9 or later, please migrate your code to pydantic v2.
  • We will reassess version compatibility upon the release of pydantic v3.
Version Supported dbt Version Supported pydantic Version
0.7 dbt 1.5 to 1.8 pydantic v2
0.6 dbt 1.5 to 1.8 pydantic v1
0.5 dbt 1.5 to 1.7 pydantic v1

Installation

pip install -U dbt-artifacts-parser

Python classes

Those are the classes to parse dbt artifacts.

Catalog

Manifest

Run Results

Sources

Examples

Parse catalog.json

import json

# parse any version of catalog.json
from dbt_artifacts_parser.parser import parse_catalog

with open("path/to/catalog.json", "r") as fp:
    catalog_dict = json.load(fp)
    catalog_obj = parse_catalog(catalog=catalog_dict)

# parse catalog.json v1
from dbt_artifacts_parser.parser import parse_catalog_v1

with open("path/to/catalog.json", "r") as fp:
    catalog_dict = json.load(fp)
    catalog_obj = parse_catalog_v1(catalog=catalog_dict)

Parse manifest.json

import json

# parse any version of manifest.json
from dbt_artifacts_parser.parser import parse_manifest

with open("path/to/manifest.json", "r") as fp:
    manifest_dict = json.load(fp)
    manifest_obj = parse_manifest(manifest=manifest_dict)

# parse manifest.json v1
from dbt_artifacts_parser.parser import parse_manifest_v1

with open("path/to/manifest.json", "r") as fp:
    manifest_dict = json.load(fp)
    manifest_obj = parse_manifest_v1(manifest=manifest_dict)

# parse manifest.json v2
from dbt_artifacts_parser.parser import parse_manifest_v2

with open("path/to/manifest.json", "r") as fp:
    manifest_dict = json.load(fp)
    manifest_obj = parse_manifest_v2(manifest=manifest_dict)

# parse manifest.json v3
from dbt_artifacts_parser.parser import parse_manifest_v3

with open("path/to/manifest.json", "r") as fp:
    manifest_dict = json.load(fp)
    manifest_obj = parse_manifest_v3(manifest=manifest_dict)

# parse manifest.json v4
from dbt_artifacts_parser.parser import parse_manifest_v4

with open("path/to/manifest.json", "r") as fp:
    manifest_dict = json.load(fp)
    manifest_obj = parse_manifest_v4(manifest=manifest_dict)

# parse manifest.json v5
from dbt_artifacts_parser.parser import parse_manifest_v5

with open("path/to/manifest.json", "r") as fp:
    manifest_dict = json.load(fp)
    manifest_obj = parse_manifest_v5(manifest=manifest_dict)

# parse manifest.json v6
from dbt_artifacts_parser.parser import parse_manifest_v6

with open("path/to/manifest.json", "r") as fp:
    manifest_dict = json.load(fp)
    manifest_obj = parse_manifest_v6(manifest=manifest_dict)

# parse manifest.json v7
from dbt_artifacts_parser.parser import parse_manifest_v7

with open("path/to/manifest.json", "r") as fp:
    manifest_dict = json.load(fp)
    manifest_obj = parse_manifest_v7(manifest=manifest_dict)

# parse manifest.json v8
from dbt_artifacts_parser.parser import parse_manifest_v8

with open("path/to/manifest.json", "r") as fp:
    manifest_dict = json.load(fp)
    manifest_obj = parse_manifest_v8(manifest=manifest_dict)

# parse manifest.json v9
from dbt_artifacts_parser.parser import parse_manifest_v9

with open("path/to/manifest.json", "r") as fp:
    manifest_dict = json.load(fp)
    manifest_obj = parse_manifest_v9(manifest=manifest_dict)

# parse manifest.json v10
from dbt_artifacts_parser.parser import parse_manifest_v10

with open("path/to/manifest.json", "r") as fp:
    manifest_dict = json.load(fp)
    manifest_obj = parse_manifest_v10(manifest=manifest_dict)

# parse manifest.json v11
from dbt_artifacts_parser.parser import parse_manifest_v11

with open("path/to/manifest.json", "r") as fp:
    manifest_dict = json.load(fp)
    manifest_obj = parse_manifest_v11(manifest=manifest_dict)

# parse manifest.json v12
from dbt_artifacts_parser.parser import parse_manifest_v12

with open("path/to/manifest.json", "r") as fp:
    manifest_dict = json.load(fp)
    manifest_obj = parse_manifest_v12(manifest=manifest_dict)

Parse run-results.json

import json

# parse any version of run-results.json
from dbt_artifacts_parser.parser import parse_run_results

with open("path/to/run-resultsjson", "r") as fp:
    run_results_dict = json.load(fp)
    run_results_obj = parse_run_results(run_results=run_results_dict)

# parse run-results.json v1
from dbt_artifacts_parser.parser import parse_run_results_v1

with open("path/to/run-results.json", "r") as fp:
    run_results_dict = json.load(fp)
    run_results_obj = parse_run_results_v1(run_results=run_results_dict)

# parse run-results.json v2
from dbt_artifacts_parser.parser import parse_run_results_v2

with open("path/to/run-results.json", "r") as fp:
    run_results_dict = json.load(fp)
    run_results_obj = parse_run_results_v2(run_results=run_results_dict)

# parse run-results.json v3
from dbt_artifacts_parser.parser import parse_run_results_v3

with open("path/to/run-results.json", "r") as fp:
    run_results_dict = json.load(fp)
    run_results_obj = parse_run_results_v3(run_results=run_results_dict)

# parse run-results.json v4
from dbt_artifacts_parser.parser import parse_run_results_v4

with open("path/to/run-results.json", "r") as fp:
    run_results_dict = json.load(fp)
    run_results_obj = parse_run_results_v4(run_results=run_results_dict)

# parse run-results.json v5
from dbt_artifacts_parser.parser import parse_run_results_v5

with open("path/to/run-results.json", "r") as fp:
    run_results_dict = json.load(fp)
    run_results_obj = parse_run_results_v5(run_results=run_results_dict)

# parse run-results.json v6
from dbt_artifacts_parser.parser import parse_run_results_v6

with open("path/to/run-results.json", "r") as fp:
    run_results_dict = json.load(fp)
    run_results_obj = parse_run_results_v6(run_results=run_results_dict)

Parse sources.json

import json

# parse any version of sources.json
from dbt_artifacts_parser.parser import parse_sources

with open("path/to/sources.json", "r") as fp:
    sources_dict = json.load(fp)
    sources_obj = parse_sources(sources=sources_dict)

# parse sources.json v1
from dbt_artifacts_parser.parser import parse_sources_v1

with open("path/to/sources.json", "r") as fp:
    sources_dict = json.load(fp)
    sources_obj = parse_sources_v1(sources=sources_dict)

# parse sources.json v2
from dbt_artifacts_parser.parser import parse_sources_v2

with open("path/to/sources.json", "r") as fp:
    sources_dict = json.load(fp)
    sources_obj = parse_sources_v2(sources=sources_dict)

# parse sources.json v3
from dbt_artifacts_parser.parser import parse_sources_v3

with open("path/to/sources.json", "r") as fp:
    sources_dict = json.load(fp)
    sources_obj = parse_sources_v3(sources=sources_dict)

Contributors

yu-iskw
Yu Ishikawa
dlawin
Null
bbrewington
Brent Brewington
judahrand
Judah Rand
nabilm
Mohamed Nabil Mahmoud Hafez
OnkarVO7
Onkar Ravgan
meyer-glean
Null

dbt-artifacts-parser's People

Contributors

bbrewington avatar dependabot[bot] avatar dlawin avatar github-actions[bot] avatar judahrand avatar meyer-glean avatar nabilm avatar onkarvo7 avatar yu-iskw avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

dbt-artifacts-parser's Issues

ManifestV10 parsing errors on full 1.6 release

"metrics" are not passing validation for V10. Example:

13:10:44 ERROR    51203 validation errors for ManifestV10                                                                                                     
                  metrics -> metric.package.metric_name -> type_params -> input_measures                                
                    extra fields not permitted (type=value_error.extra)           

Looks like the schema was updated on Aug. 7th https://schemas.getdbt.com/dbt/manifest/v10.json:

        "dbt_version": {
          "type": "string",
          "default": "1.6.0"
        },
        "generated_at": {
          "type": "string",
          "format": "date-time",
          "default": "2023-08-07T20:10:03.381822Z"
        },

Security Policy violation SECURITY.md

This issue was automatically created by Allstar.

Security Policy Violation
Security policy not enabled.
A SECURITY.md file can give users information about what constitutes a vulnerability and how to report one securely so that information about a bug is not publicly visible. Examples of secure reporting methods include using an issue tracker with private issue support, or encrypted email with a published key.

To fix this, add a SECURITY.md file that explains how to handle vulnerabilities found in your repository. Go to https://github.com/yu-iskw/dbt-artifacts-parser/security/policy to enable.

For more information, see https://docs.github.com/en/code-security/getting-started/adding-a-security-policy-to-your-repository.


This issue will auto resolve when the policy is in compliance.

Issue created by Allstar. See https://github.com/ossf/allstar/ for more information. For questions specific to the repository, please contact the owner or maintainer.

resource_type not defined correctly for manifest v11

In the manifest v11, resource_type parameter for ModelNode is defined as Any here. Due to this the actual value from manifest.json maps as a str type in pydantic model.

In the previous manifest v10,v9 this used to be mapped as an enum in python. Example here and here

This is causing dtype mismatching while processing the v11 manifest.

Solution would be to convert the v11 resource_type attribute to enum instead of any

@yu-iskw

Add an option to be more permissive with extra fields for manifest parsing

DBT Cloud released a versionless deployment mode which rollout update on Manifest it seems. It adds new properties.

You can check schema here: https://schemas.getdbt.com/dbt/manifest/v12.json

image

I think we should be more permissive with it, allowing users to specify if extra fields should be blocking or not. I guess in vast majority of cases (lineage parsing tools like https://github.com/open-metadata/OpenMetadata) it should not be blocking.

none is not an allowed value: sources -> source.default.xyz -> metadata -> type

I get the above error for dbt version 1.4.1 using snowflake.

The error is related to this line: https://github.com/yu-iskw/dbt-artifacts-parser/blob/main/dbt_artifacts_parser/parsers/catalog/catalog_v1.py#L33

The obvious fix here is to adapt the type to be Optional[str] - but if I understand this is auto-generated based on this line. And this is based on the dbt-core definition here.

What's the standard process in this case?

Using local copies of artifact schemas for validation causes breakage

Recently the package dbt-common, which is an dependency of dbt-core has updated from 1.5.0 to 1.6.0. This introduced schema changes in mainfest.json which in turn caused this package to break, since it's using a local copy of the V12 schema which has meanwhile updated to include the new schema changes (see here).

At first I've actually opened an issue with them (please see here ), but apparently their policy is, that changes to the schemas are allowed to happen in between versions, which means your package might fail at any time (when any dbt-core dependancies update etc), even if no new artifact schema version was released.

Support pydantic 2

As of this writing, we support only pydantic v1. It would be good to support pydantic v2 as well.

Base Classes

I'm not really sure how this would work, however, it might be nice to be able to have base classes for the various artifacts which would allow for better type hinting than BaseModel.

I'm open to ideas and open to helping.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.