Giter VIP home page Giter VIP logo

Comments (7)

will-sargent-dbtlabs avatar will-sargent-dbtlabs commented on June 3, 2024 1

Yeah, thanks. I suspected it's a complex issue. It's not hard to imagine a scenario where continuing on in the parse actually creates a bunch of false errors.

Scripts! I'm definitely game. Perhaps someday we can have a toolbox of these in the Cloud IDE. Thanks!!

from dbt-core.

will-sargent-dbtlabs avatar will-sargent-dbtlabs commented on June 3, 2024

@jtcohen6 @dbeatty10 - Is this syntax actually still "pointless" but valid? We are throwing a pretty hard parser error here as part of a version upgrade path..

from dbt-core.

joellabes avatar joellabes commented on June 3, 2024

@will-sargent-dbtlabs I'm going to move this to the dbt core repo - the Parsing Error you're seeing here is not being returned from the JSON Schema validation (which is informative only) but from core itself.

from dbt-core.

dbeatty10 avatar dbeatty10 commented on June 3, 2024

we had to painfully work through the parse and reparse to find each error, because the parser dies on these errors. (Find one and die, fix, full parse, find next and die).

@will-sargent-dbtlabs Oooch 😬

I'm not sure yet if we'll choose to change anything in 1.7 or not to directly address this type of situation.

But as a workaround, see below for a simple Python script that will recursively search for this type of thing within a dbt project in your current working directory.

Toggle to see Python code

python search_none_tests.py

import yaml
import glob
import os


def find_none_tests(data, path=[]):
    """
    Recursively search for keys named 'tests' with None values in the given data.
    :param data: The current part of the data to search through.
    :param path: The current path to this point in the data.
    :return: A list of paths to 'tests' keys with None values.
    """
    if isinstance(data, dict):
        for key, value in data.items():
            if key == "tests" and value is None:
                yield path + [key]
            else:
                yield from find_none_tests(value, path + [key])
    elif isinstance(data, list):
        for index, item in enumerate(data):
            yield from find_none_tests(item, path + [index])


def examine_yaml_file(yaml_file_path):
    if os.path.isfile(yaml_file_path):
        with open(yaml_file_path, "r") as file:
            data = yaml.safe_load(file)

        none_tests_paths = list(find_none_tests(data))
        if none_tests_paths:
            print(f"Found `tests` key with None values in {yaml_file_path}:")
            for path in none_tests_paths:
                print("    " + " -> ".join(map(str, path)))


def search_and_examine_yaml_files():
    # Search for all YAML files in the current directory and all subdirectories
    yaml_files = glob.glob("**/*.yaml", recursive=True) + glob.glob(
        "**/*.yml", recursive=True
    )

    if not yaml_files:
        print("No YAML files found.")
        return

    for yaml_file in yaml_files:
        examine_yaml_file(yaml_file)


if __name__ == "__main__":
    search_and_examine_yaml_files()

Then run it like this:

python search_none_tests.py

And get output like this:

Found `tests` key with None values in models/schema.yaml:
    models -> 0 -> columns -> 0 -> tests
Found `tests` key with None values in models/_properties.yml:
    models -> 0 -> tests
    models -> 0 -> columns -> 0 -> tests

from dbt-core.

dbeatty10 avatar dbeatty10 commented on June 3, 2024

I took a look at this scenario in version 1.4 vs. 1.5, and it started giving the following error in 1.5 (whereas it was allowed in 1.4):

00:19:44  Encountered an error:
Parsing Error
  Invalid models config given in models/_models.yml @ models: {'name': 'my_model', 'tests': None, 'columns': [{'name': 'id', 'tests': ['not_null']}], 'original_file_path': 'models/_models.yml', 'yaml_key': 'models', 'package_name': 'my_project'} - at path ['tests']: None is not of type 'array'

Since this scenario is explicitly called out in the migration guide for 1.5 (see screenshot below), I'm going to close this as "not planned".

image

from dbt-core.

will-sargent-dbtlabs avatar will-sargent-dbtlabs commented on June 3, 2024

Thanks for the docs link @dbeatty10.
Makes sense to me on the not planned on allowing it.

Also, thanks for providing the script!

from dbt-core.

dbeatty10 avatar dbeatty10 commented on June 3, 2024

However, is there a way that the parser won't die completely each time it hits an error like this?

We hear you on how painful this was 😢

Due to the complexities involved, I don't see us moving off the "die upon first parsing failure" approach.

We also had no idea how many of these we would hit because of that, so we are kind of like, how long do we keep this up, (can we fix in a few minutes) or is this like a sprint-impacting spike we need to do...

If you were to do this migration from 1.4 over again from scratch, I'd suggest running the Python script in #9845 (comment) to see how many of these you were facing. Then that would help you assess and estimate how much effort it would take to resolve this particular upgrade edge case.

Alternatively, I could imagine some of the YAML-editing strategies used in dbt-meshify or dbt-osmosis being adopted in a custom program that tries to perform an updated. i.e., it would attempt to edit all the relevant YAML files in-place to achieve this part of the migration from 1.4 to 1.5.

from dbt-core.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.