Giter VIP home page Giter VIP logo

Comments (5)

graciegoheen avatar graciegoheen commented on June 12, 2024

Hi! I believe we do support json structured types in unit test definitions using format: dict (see our docs here).

You'll need to use quotes though like so:

- input: ref('table_with_super')
  format: dict
  rows:
    - { structured_column: '{"foo": "bar"}', int_column: 123 }

from dbt-core.

ronco avatar ronco commented on June 12, 2024

Hey @graciegoheen , thanks for getting back to me. I've given your suggestion a try and I'm still running into difficulty. When I have this input:

          - { created_at: '2024-05-17 21:05:00', extra_info: '{"amount_usd": 50}' }

it generates this SQL for the fixture data: cast('{"amount_usd": 50}' as super)
When I try to query that fixture I get NULL:

with input_data as (select    cast('{"amount_usd": 50}' as super) as extra_info)
select extra_info.amount_usd from input_data

amount_usd|
----------+
          |

It seems the straight casting of the string to super isn't decomposing the data into structured values. However, if I switch the SQL to use json_parse I get the expected value:

with input_data as (select    cast(json_parse('{"amount_usd": 50}') as super) as extra_info)
select extra_info.amount_usd from input_data

amount_usd|
----------+
50        |

Is there a setting I'm missing for the Redshift adapter perhaps to make this work? Any other suggestions?

Thanks!

from dbt-core.

dbeatty10 avatar dbeatty10 commented on June 12, 2024

Hey @ronco -- @graciegoheen and I discussed this a little bit yesterday.

You are spot-on with the following:

It seems the straight casting of the string to super isn't decomposing the data into structured values

This is because dbt unit testing doesn't do any unnesting / decomposition of the SUPER object -- it just leaves it as the entire object.

In order to do a unit test for one of the nested values, then you'll need to supply expected values for all of the nested values as well. See below for a full example.

If this doesn't work for you, could you share an example of your YAML for your unit test and your SQL models and we'll take another look?

More detail

It looks like the key insight in dbt-redshift is that:

  • You must specify all fields in a Redshift SUPER in a unit test. You cannot use only a subset of fields in a SUPER.

This is similar to a caveat in dbt-bigquery:

You must specify all fields in a BigQuery STRUCT in a unit test. You cannot use only a subset of fields in a STRUCT.

Example

models/stg_data_types.sql

select
    1 as int_field,
    2.0 as float_field, 
    1 as numeric_field,
    'my_string' as str_field,
    'my,cool''string' as str_escaped_field,
    true as bool_field,
    cast('2020-01-02' as date) as date_field,
    cast('2013-11-03 00:00:00-0' as timestamp without time zone) as timestamp_field,
    cast('2013-11-03 00:00:00-0' as timestamp with time zone) as timestamptz_field,
    cast(json_parse('{"bar": "baz", "balance": 7.77, "active": false}') as super) as json_field

models/fct_data_types.sql

select * from {{ ref("stg_data_types") }}
unit_tests:
  - name: test_my_data_types
    model: fct_data_types
    given:
      - input: ref('stg_data_types')
        rows:
         - int_field: 1
           float_field: 2.0
           numeric_field: 1
           str_field: my_string
           str_escaped_field: "my,cool'string"
           bool_field: true
           date_field: 2020-01-02
           timestamp_field: 2013-11-03 00:00:00-0
           timestamptz_field: 2013-11-03 00:00:00-0
           json_field: '{"bar": "baz", "balance": 7.77, "active": false}'
    expect:
      rows:
         - int_field: 1
           float_field: 2.0
           numeric_field: 1
           str_field: my_string
           str_escaped_field: "my,cool'string"
           bool_field: true
           date_field: 2020-01-02
           timestamp_field: 2013-11-03 00:00:00-0
           timestamptz_field: 2013-11-03 00:00:00-0
           json_field: '{"bar": "baz", "balance": 7.77, "active": false}'

Note how the last line in the unit test includes bar, balance, and active, even though you might only care about testing the value of balance.

from dbt-core.

ronco avatar ronco commented on June 12, 2024

Hey @dbeatty10 , thanks for all this guidance. I am a little confused on how to pull this off. We're using our SUPER field for flexible data storage. In other words, there is no underlying defined STRUCT, the content varies from row to row. So I'm not sure what values I would need to add.

As requested, here are sample files illustrating the issue:
stg_data_types.sql:

select
    1 as int_field,
    2.0 as float_field, 
    1 as numeric_field,
    'my_string' as str_field,
    'my,cool''string' as str_escaped_field,
    true as bool_field,
    cast('2020-01-02' as date) as date_field,
    cast('2013-11-03 00:00:00-0' as timestamp without time zone) as timestamp_field,
    cast('2013-11-03 00:00:00-0' as timestamp with time zone) as timestamptz_field,
    cast('' as super) as json_field

and fct_data_types.sql:

select str_field,
sum(json_field.balance) as sum_balance
from {{ ref("stg_data_types") }}
group by 1

and finally schema.yml:

unit_tests:
  - name: test_my_data_types
    model: fct_data_types
    given:
      - input: ref('stg_data_types')
        rows:
         - int_field: 1
           float_field: 2.0
           numeric_field: 1
           str_field: my_string
           str_escaped_field: "my,cool'string"
           bool_field: true
           date_field: 2020-01-02
           timestamp_field: 2013-11-03 00:00:00-0
           timestamptz_field: 2013-11-03 00:00:00-0
           json_field: '{"bar": "baz", "balance": 7.77, "active": false}'
    expect:
      rows:
         - str_field: 'my_string'
           sum_balance: 7.77

When I run that test I get the following output:

17:06:28  1 of 1 START unit_test fct_data_types::test_my_data_types ...................... [RUN]
17:06:30  1 of 1 FAIL 1 fct_data_types::test_my_data_types ............................... [FAIL 1 in 2.58s]
17:06:31
17:06:31  Finished running 1 unit test in 0 hours 0 minutes and 6.89 seconds (6.89s).
17:06:31
17:06:31  Completed with 1 error and 0 warnings:
17:06:31
17:06:31  Failure in unit_test test_my_data_types (models/dbt-test/schema.yml)
17:06:31

actual differs from expected:

@@,str_field,sum_balance
→ ,my_string,7.77→NULL

I think the problem is the cast is treating the json_field as a plain string, rather than a structured field I can extract data from. Let me know if there's anything else I can provide to get to the bottom of this.

Thanks!

from dbt-core.

ronco avatar ronco commented on June 12, 2024

Any further thoughts here? Thanks!

from dbt-core.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.