Comments (3)
You are correct about the accepted.tolerance()
behavior--it's only applied to direct child values, not to values within nested dictionaries. A workaround would be to convert nested dictionaries into a flattened dictionary with composite keys.
Here's a function that converts nested dictionaries into a flat dictionary with composite tuple keys:
def flatten(d, parent_key=()):
"""Helper function to flatten nested dictionaries."""
items = []
for k, v in d.items():
new_key = tuple(parent_key) + (k,) if parent_key else k
if isinstance(v, dict):
items.extend(flatten(v, new_key).items())
else:
items.append((new_key, v))
return dict(items)
Using the function above, you could flatten the dictionaries like so:
>>> dict1 = {"x": {"a": 0.99, "b": 2.0}, "y": 3.0}
>>> dict2 = {"x": {"a": 1.0, "b": 1.99}, "y": 2.99}
>>> flatten(dict1)
{('x', 'a'): 0.99, ('x', 'b'): 2.0, 'y': 3.0}
>>> flatten(dict2)
{('x', 'a'): 1.0, ('x', 'b'): 1.99, 'y': 2.99}
This would let you change your sample code to the following:
import pytest
from datatest import validate, accepted, ValidationError
def flatten(d, parent_key=()):
"""Helper function to flatten nested dictionaries."""
items = []
for k, v in d.items():
new_key = tuple(parent_key) + (k,) if parent_key else k
if isinstance(v, dict):
items.extend(flatten(v, new_key).items())
else:
items.append((new_key, v))
return dict(items)
def test_datatest():
dict1 = {"x": {"a": 0.991, "b": 2.0}, "y": 3.0}
dict2 = {"x": {"a": 1.0, "b": 1.991}, "y": 2.991}
with accepted.tolerance(0.01):
validate(flatten(dict1), flatten(dict2)) # <- Flattened for validation.
# NOTE: I changed the `.99`s in this sample code because
# the floating point math was giving me a difference of
# `0.010000000000000009` (outside the accepted tolerance).
I like the idea of validating nested dictionary values directly but the implementation gets more complex that it might initially seem. Since ValidationError differences reflect the structure of the tested data, nested dictionaries would mean nested difference handling. At this time, the internal acceptance machinery is not set-up to handle this sort of thing and in combination with accepted.count()
it would have resulted in non-deterministic behavior when running on older versions of Python. This is because it was written to support versions of Python that didn't guarantee dictionaries with stable order.
That said, future versions of datatest will drop support for those old versions of Python and direct validation of nested values should be possible. But that's not something I can add in the short term. For now, the dictionaries will need to be flattened for validation.
This is a good question though and I should definitely add a page to the How-to Guide that addresses this use case.
from datatest.
A different flatten()
function could combine the keys into a single string value. Doing this is less precise than the tuple-keys version shown previously but many use cases don't need to preserve the keys exactly and the result can be more readable:
def flatten(d, parent_key="", sep="."):
"""Helper function to flatten nested dictionaries."""
items = []
for k, v in d.items():
new_key = f"{parent_key}{sep}{k}" if parent_key else k
if isinstance(v, dict):
items.extend(flatten(v, new_key, sep=sep).items())
else:
items.append((new_key, v))
return dict(items)
This function would give more compact keys:
>>> dict1 = {"x": {"a": 0.99, "b": 2.0}, "y": 3.0}
>>> dict2 = {"x": {"a": 1.0, "b": 1.99}, "y": 2.99}
>>> flatten(dict1)
{'x.a': 0.99, 'x.b': 2.0, 'y': 3.0}
>>> flatten(dict2)
{'x.a': 1.0, 'x.b': 1.99, 'y': 2.99}
from datatest.
Thanks @shawnbrown for the excellent, fast response. The workaround with flatten() that combined the keys into a single string was perfect for my use-case.
It's no problem if you want to close this issue, preferably after updating the documentation :).
from datatest.
Related Issues (20)
- Fully Composable Allowances. HOT 1
- Simplified DataSource Loading. HOT 1
- Selector.load_data() silently fails on missing file. HOT 1
- pytest_runtest_makereport crashes on test exceptions HOT 2
- Add "How to Validate Inequalities" documentation.
- Add "How to Validate Counts and Cardinality" documentation.
- Change get_reader.from_excel() to accept keyword arguments HOT 1
- AcceptedExtra not working as expected with dicts HOT 3
- validation errors Extra(nan) or Invalid(nan) HOT 5
- Squint objects not handled properly when used as requirements. HOT 1
- Crashes pytest-xdist processes (NOTE: See comments for fix.) HOT 3
- Investigate Support for DataFrame-Protocol
- Squint nested-mapping queries not handled properly with non-mapping requirements.
- NaT issue HOT 5
- Improve error message for @working_directory decorator
- Hey man! HOT 1
- Improve existing or create another Deviation-like difference
- Understanding Pandas validation HOT 1
- How to validate Pandas data type "Int64"?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from datatest.