Comments (5)
Hi! I believe we do support json structured types in unit test definitions using format: dict
(see our docs here).
You'll need to use quotes though like so:
- input: ref('table_with_super')
format: dict
rows:
- { structured_column: '{"foo": "bar"}', int_column: 123 }
from dbt-core.
Hey @graciegoheen , thanks for getting back to me. I've given your suggestion a try and I'm still running into difficulty. When I have this input:
- { created_at: '2024-05-17 21:05:00', extra_info: '{"amount_usd": 50}' }
it generates this SQL for the fixture data: cast('{"amount_usd": 50}' as super)
When I try to query that fixture I get NULL:
with input_data as (select cast('{"amount_usd": 50}' as super) as extra_info)
select extra_info.amount_usd from input_data
amount_usd|
----------+
|
It seems the straight casting of the string to super isn't decomposing the data into structured values. However, if I switch the SQL to use json_parse
I get the expected value:
with input_data as (select cast(json_parse('{"amount_usd": 50}') as super) as extra_info)
select extra_info.amount_usd from input_data
amount_usd|
----------+
50 |
Is there a setting I'm missing for the Redshift adapter perhaps to make this work? Any other suggestions?
Thanks!
from dbt-core.
Hey @ronco -- @graciegoheen and I discussed this a little bit yesterday.
You are spot-on with the following:
It seems the straight casting of the string to super isn't decomposing the data into structured values
This is because dbt unit testing doesn't do any unnesting / decomposition of the SUPER
object -- it just leaves it as the entire object.
In order to do a unit test for one of the nested values, then you'll need to supply expected values for all of the nested values as well. See below for a full example.
If this doesn't work for you, could you share an example of your YAML for your unit test and your SQL models and we'll take another look?
More detail
It looks like the key insight in dbt-redshift is that:
- You must specify all fields in a Redshift
SUPER
in a unit test. You cannot use only a subset of fields in aSUPER
.
This is similar to a caveat in dbt-bigquery:
You must specify all fields in a BigQuery
STRUCT
in a unit test. You cannot use only a subset of fields in aSTRUCT
.
Example
models/stg_data_types.sql
select
1 as int_field,
2.0 as float_field,
1 as numeric_field,
'my_string' as str_field,
'my,cool''string' as str_escaped_field,
true as bool_field,
cast('2020-01-02' as date) as date_field,
cast('2013-11-03 00:00:00-0' as timestamp without time zone) as timestamp_field,
cast('2013-11-03 00:00:00-0' as timestamp with time zone) as timestamptz_field,
cast(json_parse('{"bar": "baz", "balance": 7.77, "active": false}') as super) as json_field
models/fct_data_types.sql
select * from {{ ref("stg_data_types") }}
unit_tests:
- name: test_my_data_types
model: fct_data_types
given:
- input: ref('stg_data_types')
rows:
- int_field: 1
float_field: 2.0
numeric_field: 1
str_field: my_string
str_escaped_field: "my,cool'string"
bool_field: true
date_field: 2020-01-02
timestamp_field: 2013-11-03 00:00:00-0
timestamptz_field: 2013-11-03 00:00:00-0
json_field: '{"bar": "baz", "balance": 7.77, "active": false}'
expect:
rows:
- int_field: 1
float_field: 2.0
numeric_field: 1
str_field: my_string
str_escaped_field: "my,cool'string"
bool_field: true
date_field: 2020-01-02
timestamp_field: 2013-11-03 00:00:00-0
timestamptz_field: 2013-11-03 00:00:00-0
json_field: '{"bar": "baz", "balance": 7.77, "active": false}'
Note how the last line in the unit test includes bar
, balance
, and active
, even though you might only care about testing the value of balance
.
from dbt-core.
Hey @dbeatty10 , thanks for all this guidance. I am a little confused on how to pull this off. We're using our SUPER field for flexible data storage. In other words, there is no underlying defined STRUCT, the content varies from row to row. So I'm not sure what values I would need to add.
As requested, here are sample files illustrating the issue:
stg_data_types.sql
:
select
1 as int_field,
2.0 as float_field,
1 as numeric_field,
'my_string' as str_field,
'my,cool''string' as str_escaped_field,
true as bool_field,
cast('2020-01-02' as date) as date_field,
cast('2013-11-03 00:00:00-0' as timestamp without time zone) as timestamp_field,
cast('2013-11-03 00:00:00-0' as timestamp with time zone) as timestamptz_field,
cast('' as super) as json_field
and fct_data_types.sql
:
select str_field,
sum(json_field.balance) as sum_balance
from {{ ref("stg_data_types") }}
group by 1
and finally schema.yml
:
unit_tests:
- name: test_my_data_types
model: fct_data_types
given:
- input: ref('stg_data_types')
rows:
- int_field: 1
float_field: 2.0
numeric_field: 1
str_field: my_string
str_escaped_field: "my,cool'string"
bool_field: true
date_field: 2020-01-02
timestamp_field: 2013-11-03 00:00:00-0
timestamptz_field: 2013-11-03 00:00:00-0
json_field: '{"bar": "baz", "balance": 7.77, "active": false}'
expect:
rows:
- str_field: 'my_string'
sum_balance: 7.77
When I run that test I get the following output:
17:06:28 1 of 1 START unit_test fct_data_types::test_my_data_types ...................... [RUN]
17:06:30 1 of 1 FAIL 1 fct_data_types::test_my_data_types ............................... [FAIL 1 in 2.58s]
17:06:31
17:06:31 Finished running 1 unit test in 0 hours 0 minutes and 6.89 seconds (6.89s).
17:06:31
17:06:31 Completed with 1 error and 0 warnings:
17:06:31
17:06:31 Failure in unit_test test_my_data_types (models/dbt-test/schema.yml)
17:06:31
actual differs from expected:
@@,str_field,sum_balance
→ ,my_string,7.77→NULL
I think the problem is the cast is treating the json_field as a plain string, rather than a structured field I can extract data from. Let me know if there's anything else I can provide to get to the bottom of this.
Thanks!
from dbt-core.
Any further thoughts here? Thanks!
from dbt-core.
Related Issues (20)
- Add `--host` flag to `dbt docs serve`, defaulting to '127.0.0.1' HOT 2
- [Bug] `dbt -v` doesn't know that adapters are decoupled from core HOT 1
- Warn if `updated_at` field for snapshot is not same datatype (or timezone) as what's returned in `snapshot_get_time()` HOT 2
- Improve capturing the history of records in a deleted state HOT 2
- New snapshot config to validate uniqueness before merge
- Support specifying types for recording
- [Bug] the run_query macro causes unit tests to fail with a SQL Compilation error HOT 1
- [Feature] No more jinja block for snapshots - new snapshot design
- [Feature] Emit a more specific error when there are duplicate columns in a model with an enforced contract HOT 1
- Replace jsonschema with fastjsonschema for validation
- [Security] CVE-2019-8341 advice please HOT 5
- [Feature] Conservative with changing public Python API's HOT 1
- [Feature] Unit tests should support Gherkin table style inputs
- [Bug] Error running unit tests that use the `dbt_utils.star` macro HOT 2
- Create "happy path" project fixture and use it for `tests/functional/list/test_list.py`
- Use, and extend as necessary, happy path fixture for `tests/functional/show/test_show.py`
- [Bug] dbt test --select "source:*" runs model tests if they reference a source HOT 3
- [Feature] Add a head command to see the first rows of a table HOT 5
- [Feature] Add `--fail-if-nothing-selected` (or something similar) HOT 6
- [Bug] Not able to select unit tests via the `--resource-type` flag HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dbt-core.