Is this your first time submitting a feature request? <ul class="contains-task-l

Hey <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url=

[Feature] Support Semi-Structured Data Columns in Unit Test dict fixture data about dbt-core HOT 5 OPEN

ronco commented on June 12, 2024

[Feature] Support Semi-Structured Data Columns in Unit Test dict fixture data

from dbt-core.

Comments (5)

graciegoheen commented on June 12, 2024

Hi! I believe we do support json structured types in unit test definitions using format: dict (see our docs here).

You'll need to use quotes though like so:

- input: ref('table_with_super')
  format: dict
  rows:
    - { structured_column: '{"foo": "bar"}', int_column: 123 }

from dbt-core.

ronco commented on June 12, 2024

Hey @graciegoheen , thanks for getting back to me. I've given your suggestion a try and I'm still running into difficulty. When I have this input:

          - { created_at: '2024-05-17 21:05:00', extra_info: '{"amount_usd": 50}' }

it generates this SQL for the fixture data: cast('{"amount_usd": 50}' as super)
When I try to query that fixture I get NULL:

with input_data as (select    cast('{"amount_usd": 50}' as super) as extra_info)
select extra_info.amount_usd from input_data

amount_usd|
----------+
          |

It seems the straight casting of the string to super isn't decomposing the data into structured values. However, if I switch the SQL to use json_parse I get the expected value:

with input_data as (select    cast(json_parse('{"amount_usd": 50}') as super) as extra_info)
select extra_info.amount_usd from input_data

amount_usd|
----------+
50        |

Is there a setting I'm missing for the Redshift adapter perhaps to make this work? Any other suggestions?

Thanks!

from dbt-core.

dbeatty10 commented on June 12, 2024

Hey @ronco -- @graciegoheen and I discussed this a little bit yesterday.

You are spot-on with the following:

It seems the straight casting of the string to super isn't decomposing the data into structured values

This is because dbt unit testing doesn't do any unnesting / decomposition of the SUPER object -- it just leaves it as the entire object.

In order to do a unit test for one of the nested values, then you'll need to supply expected values for all of the nested values as well. See below for a full example.

If this doesn't work for you, could you share an example of your YAML for your unit test and your SQL models and we'll take another look?

More detail

It looks like the key insight in dbt-redshift is that:

You must specify all fields in a Redshift SUPER in a unit test. You cannot use only a subset of fields in a SUPER.

This is similar to a caveat in dbt-bigquery:

You must specify all fields in a BigQuery STRUCT in a unit test. You cannot use only a subset of fields in a STRUCT.

Example

models/stg_data_types.sql

select
    1 as int_field,
    2.0 as float_field, 
    1 as numeric_field,
    'my_string' as str_field,
    'my,cool''string' as str_escaped_field,
    true as bool_field,
    cast('2020-01-02' as date) as date_field,
    cast('2013-11-03 00:00:00-0' as timestamp without time zone) as timestamp_field,
    cast('2013-11-03 00:00:00-0' as timestamp with time zone) as timestamptz_field,
    cast(json_parse('{"bar": "baz", "balance": 7.77, "active": false}') as super) as json_field

models/fct_data_types.sql

select * from {{ ref("stg_data_types") }}

unit_tests:
  - name: test_my_data_types
    model: fct_data_types
    given:
      - input: ref('stg_data_types')
        rows:
         - int_field: 1
           float_field: 2.0
           numeric_field: 1
           str_field: my_string
           str_escaped_field: "my,cool'string"
           bool_field: true
           date_field: 2020-01-02
           timestamp_field: 2013-11-03 00:00:00-0
           timestamptz_field: 2013-11-03 00:00:00-0
           json_field: '{"bar": "baz", "balance": 7.77, "active": false}'
    expect:
      rows:
         - int_field: 1
           float_field: 2.0
           numeric_field: 1
           str_field: my_string
           str_escaped_field: "my,cool'string"
           bool_field: true
           date_field: 2020-01-02
           timestamp_field: 2013-11-03 00:00:00-0
           timestamptz_field: 2013-11-03 00:00:00-0
           json_field: '{"bar": "baz", "balance": 7.77, "active": false}'

Note how the last line in the unit test includes bar, balance, and active, even though you might only care about testing the value of balance.

from dbt-core.

ronco commented on June 12, 2024

Hey @dbeatty10 , thanks for all this guidance. I am a little confused on how to pull this off. We're using our SUPER field for flexible data storage. In other words, there is no underlying defined STRUCT, the content varies from row to row. So I'm not sure what values I would need to add.

As requested, here are sample files illustrating the issue:
stg_data_types.sql:

select
    1 as int_field,
    2.0 as float_field, 
    1 as numeric_field,
    'my_string' as str_field,
    'my,cool''string' as str_escaped_field,
    true as bool_field,
    cast('2020-01-02' as date) as date_field,
    cast('2013-11-03 00:00:00-0' as timestamp without time zone) as timestamp_field,
    cast('2013-11-03 00:00:00-0' as timestamp with time zone) as timestamptz_field,
    cast('' as super) as json_field

and fct_data_types.sql:

select str_field,
sum(json_field.balance) as sum_balance
from {{ ref("stg_data_types") }}
group by 1

and finally schema.yml:

unit_tests:
  - name: test_my_data_types
    model: fct_data_types
    given:
      - input: ref('stg_data_types')
        rows:
         - int_field: 1
           float_field: 2.0
           numeric_field: 1
           str_field: my_string
           str_escaped_field: "my,cool'string"
           bool_field: true
           date_field: 2020-01-02
           timestamp_field: 2013-11-03 00:00:00-0
           timestamptz_field: 2013-11-03 00:00:00-0
           json_field: '{"bar": "baz", "balance": 7.77, "active": false}'
    expect:
      rows:
         - str_field: 'my_string'
           sum_balance: 7.77

When I run that test I get the following output:

17:06:28  1 of 1 START unit_test fct_data_types::test_my_data_types ...................... [RUN]
17:06:30  1 of 1 FAIL 1 fct_data_types::test_my_data_types ............................... [FAIL 1 in 2.58s]
17:06:31
17:06:31  Finished running 1 unit test in 0 hours 0 minutes and 6.89 seconds (6.89s).
17:06:31
17:06:31  Completed with 1 error and 0 warnings:
17:06:31
17:06:31  Failure in unit_test test_my_data_types (models/dbt-test/schema.yml)
17:06:31

actual differs from expected:

@@,str_field,sum_balance
→ ,my_string,7.77→NULL

I think the problem is the cast is treating the json_field as a plain string, rather than a structured field I can extract data from. Let me know if there's anything else I can provide to get to the bottom of this.

Thanks!

from dbt-core.

ronco commented on June 12, 2024

Any further thoughts here? Thanks!

from dbt-core.

[Feature] Support Semi-Structured Data Columns in Unit Test dict fixture data about dbt-core HOT 5 OPEN

Comments (5)

More detail

Example

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent