shawnbrown / datatest Goto Github PK
View Code? Open in Web Editor NEWTools for test driven data-wrangling and data validation.
License: Other
Tools for test driven data-wrangling and data validation.
License: Other
I expected with AcceptedExtra():
to ignore missing keys in dicts, but instead it raises a Deviation
from None
.
Here is an example:
actual = {'a': 1, 'b': 2}
expected = {'b': 2}
with AcceptedExtra():
validate(actual, requirement=expected)
The output is:
E ValidationError: does not satisfy mapping requirements (1 difference): {
'a': Deviation(+1, None),
}
Thanks for the cool package, by the way!
Once major pieces are in place, explore ways of optimizing the validation/allowance process. Look to implement the following possible improvements:
Hi, all! I've got some problem, when start my tests with pytest-xdist
MacOS(Also check in debian)
python 3.8.2
pytest==5.4.3
pytest-xdist==1.33.0
datatest==0.9.6
from datatest import accepted, Extra, validate as __validate
def test_should_passed():
with accepted(Extra):
__validate({"qwe": 1}, {"qwe": 1}, "")
def test_should_failed():
with accepted(Extra):
__validate({"qwe": 1}, {"qwe": 2}, "")
if __name__ == '__main__':
import sys, pytest
sys.exit(pytest.main(['/Users/qa/PycharmProjects/qa/test123.py', '-vvv', '-n', '1', '-s']))
Output:
test123.py::test_should_passed
[gw0] PASSED test123.py::test_should_passed
test123.py::test_should_failed !!!!!!!!!!!!!!!!!!!! <ExceptionInfo RuntimeError('\'----------------------------------------------------------------------------------------------------\'.../issues\'\n\'----------------------------------------------------------------------------------------------------\'\n') tblen=14>
INTERNALERROR> Traceback (most recent call last):
INTERNALERROR> File "/Users/qa/PycharmProjects/qa/venv/lib/python3.8/site-packages/xdist/workermanage.py", line 334, in process_from_remote
INTERNALERROR> rep = self.config.hook.pytest_report_from_serializable(
INTERNALERROR> File "/Users/qa/PycharmProjects/qa/venv/lib/python3.8/site-packages/pluggy/hooks.py", line 286, in __call__
INTERNALERROR> return self._hookexec(self, self.get_hookimpls(), kwargs)
INTERNALERROR> File "/Users/qa/PycharmProjects/qa/venv/lib/python3.8/site-packages/pluggy/manager.py", line 93, in _hookexec
INTERNALERROR> return self._inner_hookexec(hook, methods, kwargs)
INTERNALERROR> File "/Users/qa/PycharmProjects/qa/venv/lib/python3.8/site-packages/pluggy/manager.py", line 84, in <lambda>
INTERNALERROR> self._inner_hookexec = lambda hook, methods, kwargs: hook.multicall(
INTERNALERROR> File "/Users/qa/PycharmProjects/qa/venv/lib/python3.8/site-packages/pluggy/callers.py", line 208, in _multicall
INTERNALERROR> return outcome.get_result()
INTERNALERROR> File "/Users/qa/PycharmProjects/qa/venv/lib/python3.8/site-packages/pluggy/callers.py", line 80, in get_result
INTERNALERROR> raise ex[1].with_traceback(ex[2])
INTERNALERROR> File "/Users/qa/PycharmProjects/qa/venv/lib/python3.8/site-packages/pluggy/callers.py", line 187, in _multicall
INTERNALERROR> res = hook_impl.function(*args)
INTERNALERROR> File "/Users/qa/PycharmProjects/qa/venv/lib/python3.8/site-packages/_pytest/reports.py", line 355, in pytest_report_from_serializable
INTERNALERROR> return TestReport._from_json(data)
INTERNALERROR> File "/Users/qa/PycharmProjects/qa/venv/lib/python3.8/site-packages/_pytest/reports.py", line 193, in _from_json
INTERNALERROR> kwargs = _report_kwargs_from_json(reportdict)
INTERNALERROR> File "/Users/qa/PycharmProjects/qa/venv/lib/python3.8/site-packages/_pytest/reports.py", line 485, in _report_kwargs_from_json
INTERNALERROR> reprtraceback = deserialize_repr_traceback(
INTERNALERROR> File "/Users/qa/PycharmProjects/qa/venv/lib/python3.8/site-packages/_pytest/reports.py", line 468, in deserialize_repr_traceback
INTERNALERROR> repr_traceback_dict["reprentries"] = [
INTERNALERROR> File "/Users/qa/PycharmProjects/qa/venv/lib/python3.8/site-packages/_pytest/reports.py", line 469, in <listcomp>
INTERNALERROR> deserialize_repr_entry(x) for x in repr_traceback_dict["reprentries"]
INTERNALERROR> File "/Users/qa/PycharmProjects/qa/venv/lib/python3.8/site-packages/_pytest/reports.py", line 464, in deserialize_repr_entry
INTERNALERROR> _report_unserialization_failure(entry_type, TestReport, reportdict)
INTERNALERROR> File "/Users/qa/PycharmProjects/qa/venv/lib/python3.8/site-packages/_pytest/reports.py", line 206, in _report_unserialization_failure
INTERNALERROR> raise RuntimeError(stream.getvalue())
INTERNALERROR> RuntimeError: '----------------------------------------------------------------------------------------------------'
INTERNALERROR> 'INTERNALERROR: Unknown entry type returned: DatatestReprEntry'
INTERNALERROR> "report_name: <class '_pytest.reports.TestReport'>"
INTERNALERROR> {'$report_type': 'TestReport',
INTERNALERROR> 'duration': 0.002020120620727539,
INTERNALERROR> 'item_index': 1,
INTERNALERROR> 'keywords': {'qa': 1, 'test123.py': 1, 'test_should_failed': 1},
INTERNALERROR> 'location': ('test123.py', 8, 'test_should_failed'),
INTERNALERROR> 'longrepr': {'chain': [({'extraline': None,
INTERNALERROR> 'reprentries': [{'data': {'lines': [' def '
INTERNALERROR> 'test_should_failed():',
INTERNALERROR> ' with '
INTERNALERROR> 'accepted(Extra):',
INTERNALERROR> '> '
INTERNALERROR> '__validate({"qwe": '
INTERNALERROR> '1}, {"qwe": 2}, '
INTERNALERROR> '"")',
INTERNALERROR> 'E '
INTERNALERROR> 'datatest.ValidationError: '
INTERNALERROR> 'does not '
INTERNALERROR> 'satisfy 2 (1 '
INTERNALERROR> 'difference): {',
INTERNALERROR> 'E '
INTERNALERROR> "'qwe': "
INTERNALERROR> 'Deviation(-1, '
INTERNALERROR> '2),',
INTERNALERROR> 'E }'],
INTERNALERROR> 'reprfileloc': {'lineno': 11,
INTERNALERROR> 'message': 'ValidationError',
INTERNALERROR> 'path': 'test123.py'},
INTERNALERROR> 'reprfuncargs': {'args': []},
INTERNALERROR> 'reprlocals': None,
INTERNALERROR> 'style': 'long'},
INTERNALERROR> 'type': 'DatatestReprEntry'}],
INTERNALERROR> 'style': 'long'},
INTERNALERROR> {'lineno': 11,
INTERNALERROR> 'message': 'datatest.ValidationError: does not '
INTERNALERROR> 'satisfy 2 (1 difference): {\n'
INTERNALERROR> " 'qwe': Deviation(-1, 2),\n"
INTERNALERROR> '}',
INTERNALERROR> 'path': '/Users/qa/PycharmProjects/qa/test123.py'},
INTERNALERROR> None)],
INTERNALERROR> 'reprcrash': {'lineno': 11,
INTERNALERROR> 'message': 'datatest.ValidationError: does not '
INTERNALERROR> 'satisfy 2 (1 difference): {\n'
INTERNALERROR> " 'qwe': Deviation(-1, 2),\n"
INTERNALERROR> '}',
INTERNALERROR> 'path': '/Users/qa/PycharmProjects/qa/test123.py'},
INTERNALERROR> 'reprtraceback': {'extraline': None,
INTERNALERROR> 'reprentries': [{'data': {'lines': [' def '
INTERNALERROR> 'test_should_failed():',
INTERNALERROR> ' '
INTERNALERROR> 'with '
INTERNALERROR> 'accepted(Extra):',
INTERNALERROR> '> '
INTERNALERROR> '__validate({"qwe": '
INTERNALERROR> '1}, '
INTERNALERROR> '{"qwe": '
INTERNALERROR> '2}, "")',
INTERNALERROR> 'E '
INTERNALERROR> 'datatest.ValidationError: '
INTERNALERROR> 'does not '
INTERNALERROR> 'satisfy 2 '
INTERNALERROR> '(1 '
INTERNALERROR> 'difference): '
INTERNALERROR> '{',
INTERNALERROR> 'E '
INTERNALERROR> "'qwe': "
INTERNALERROR> 'Deviation(-1, '
INTERNALERROR> '2),',
INTERNALERROR> 'E '
INTERNALERROR> '}'],
INTERNALERROR> 'reprfileloc': {'lineno': 11,
INTERNALERROR> 'message': 'ValidationError',
INTERNALERROR> 'path': 'test123.py'},
INTERNALERROR> 'reprfuncargs': {'args': []},
INTERNALERROR> 'reprlocals': None,
INTERNALERROR> 'style': 'long'},
INTERNALERROR> 'type': 'DatatestReprEntry'}],
INTERNALERROR> 'style': 'long'},
INTERNALERROR> 'sections': []},
INTERNALERROR> 'nodeid': 'test123.py::test_should_failed',
INTERNALERROR> 'outcome': 'failed',
INTERNALERROR> 'sections': [],
INTERNALERROR> 'testrun_uid': 'c913bf205a874a50a237dcf40d482d06',
INTERNALERROR> 'user_properties': [],
INTERNALERROR> 'when': 'call',
INTERNALERROR> 'worker_id': 'gw0'}
INTERNALERROR> 'Please report this bug at https://github.com/pytest-dev/pytest/issues'
INTERNALERROR> '----------------------------------------------------------------------------------------------------'
[gw0] node down: <ExceptionInfo RuntimeError('\'----------------------------------------------------------------------------------------------------\'.../issues\'\n\'----------------------------------------------------------------------------------------------------\'\n') tblen=14>
[gw0] FAILED test123.py::test_should_failed
replacing crashed worker gw0
[gw1] darwin Python 3.8.3 cwd: /Users/qa/PycharmProjects/qa
INTERNALERROR> Traceback (most recent call last):
INTERNALERROR> File "/Users/qa/PycharmProjects/qa/venv/lib/python3.8/site-packages/_pytest/main.py", line 191, in wrap_session
INTERNALERROR> session.exitstatus = doit(config, session) or 0
INTERNALERROR> File "/Users/qa/PycharmProjects/qa/venv/lib/python3.8/site-packages/_pytest/main.py", line 247, in _main
INTERNALERROR> config.hook.pytest_runtestloop(session=session)
INTERNALERROR> File "/Users/qa/PycharmProjects/qa/venv/lib/python3.8/site-packages/pluggy/hooks.py", line 286, in __call__
INTERNALERROR> return self._hookexec(self, self.get_hookimpls(), kwargs)
INTERNALERROR> File "/Users/qa/PycharmProjects/qa/venv/lib/python3.8/site-packages/pluggy/manager.py", line 93, in _hookexec
INTERNALERROR> return self._inner_hookexec(hook, methods, kwargs)
INTERNALERROR> File "/Users/qa/PycharmProjects/qa/venv/lib/python3.8/site-packages/pluggy/manager.py", line 84, in <lambda>
INTERNALERROR> self._inner_hookexec = lambda hook, methods, kwargs: hook.multicall(
INTERNALERROR> File "/Users/qa/PycharmProjects/qa/venv/lib/python3.8/site-packages/pluggy/callers.py", line 208, in _multicall
INTERNALERROR> return outcome.get_result()
INTERNALERROR> File "/Users/qa/PycharmProjects/qa/venv/lib/python3.8/site-packages/pluggy/callers.py", line 80, in get_result
INTERNALERROR> raise ex[1].with_traceback(ex[2])
INTERNALERROR> File "/Users/qa/PycharmProjects/qa/venv/lib/python3.8/site-packages/pluggy/callers.py", line 187, in _multicall
INTERNALERROR> res = hook_impl.function(*args)
INTERNALERROR> File "/Users/qa/PycharmProjects/qa/venv/lib/python3.8/site-packages/xdist/dsession.py", line 112, in pytest_runtestloop
INTERNALERROR> self.loop_once()
INTERNALERROR> File "/Users/qa/PycharmProjects/qa/venv/lib/python3.8/site-packages/xdist/dsession.py", line 135, in loop_once
INTERNALERROR> call(**kwargs)
INTERNALERROR> File "/Users/qa/PycharmProjects/qa/venv/lib/python3.8/site-packages/xdist/dsession.py", line 263, in worker_runtest_protocol_complete
INTERNALERROR> self.sched.mark_test_complete(node, item_index, duration)
INTERNALERROR> File "/Users/qa/PycharmProjects/qa/venv/lib/python3.8/site-packages/xdist/scheduler/load.py", line 151, in mark_test_complete
INTERNALERROR> self.node2pending[node].remove(item_index)
INTERNALERROR> KeyError: <WorkerController gw0>
But if I change second test like this, all works fine:
def test_should_failed():
try:
with accepted(Extra):
__validate({"qwe": 1}, {"qwe": 2}, "")
except:
raise ValueError
I don't know exactly where i should create bug\issue about this :)
Change get_reader.from_excel()
to accept keyword arguments that are passed on to the call to xlrd.open_workbook()
(see line 325 in get_reader.py).
It would be nice to inherit from unittest2 when it's available. This will provide a clean way to support setUpModule()
and setUpClass()
on Python 3.1 and 2.6 as well as other features of the package.
Add docstrings for new allowances--for reference, use previous allowance's docstrings (link).
The PandasSource
class is currently a minimal implementation.
Need to optimize the following methods:
filter_rows()
distinct()
sum()
count()
mapreduce()
Need to normalize the API and fix msg handling for all of the allow...
context managers.
For completely local testing, this project provides its own ./run-tests.sh script (run-tests.bat
on Windows) but adding Travis CI testing would be a nice touch.
Hey Shawn - one of the problems you were speaking about at PyCon 2016 was looking to guarantee that all integers in a list were unique, in an efficient way for large sets of data?
Calling MultiSource()
without providing any sub-sources should raise an error. Allowing empty MultiSource
objects can cause confusing error messages.
The new validate function can compare mappings of data and when differences are found, they are returned as mappings of differences. All of the assert functions must be updated to handle this new behavior.
Currently, the deviation allowances (allowDeviation
and allowPercentDeviation
) can only be set to ranges around zero (asserts lower <= 0 <= upper
). This should be changed to allow deviations from a given range even when the entire range is above or below zero.
The lower and upper arguments should also work when set to the exact same number:
with self.allowPercentDeviation(lower=-1.0, upper=-1.0): # Allows only 100%
self.assertSubjectSum('C', ['A', 'B']) # deviations but no
# others.
Currently, the __repr__()
for difference objects returns keyword-style formatting even when **kwds
keys are not valid identifiers:
>>> kwds = {'foo': 'AAA', 'bar baz': 'BBB'}
>>> datatest.Deviation(1, 12, **kwds)
Deviation(+1, 12, bar baz='BBB', foo='AAA') # <- Can NOT be eval'd!
When keys are not valid identifiers, then dict-style format should be used:
>>> datatest.Deviation(1, 12, **kwds)
Deviation(+1, 12, **{'foo': 'AAA', 'bar baz': 'BBB'}) # <- CAN be eval'd!
Simplify DataSource()
loading behavior. The default __init__()
method should accept a variety of formats (CSV, records, XLSX, etc.) and handle the data loading appropriately.
Make all allowance types fully composable with each other using binary operators &
and |
.
Add pass-through behavior to assertValid() to help replace functionality that will be lost when removing the more magical helper methods (like assertSubjectSum(), et al).
Using the existing data/required syntax can be cumbersome in certain cases. By adding support for an optional calling convention (a function signature), duplication can be reduced. While the data/required signature will remain the default behavior (which will appear in the docstring when introspected) a second, optional signature will also be supported:
TestCase.assertValid(data, required, msg=None)
TestCase.assertValid(function, /, msg=None)
Move old implementations to backward compatibility sub-package, rename new implementations, and update wrapper methods.
Squint objects are not being evaluated properly by datatest.validate()
function:
import datatest
import squint
# Create a Select object.
select = squint.Select([['A', 'B'], ['x', 1], ['y', 2], ['z', 3]])
# Compare data to itself--passes as expected.
datatest.validate(
select({'A': {'B'}}),
select({'A': {'B'}}).fetch(), # <- Shouldn't be necessary.
)
# Compare data to itself--fails, unexpectedly.
datatest.validate(
select({'A': {'B'}}),
select({'A': {'B'}}), # <- Not properly handled!
)
In the code above, the second call to datatest.validate()
should pass but, instead, fails with the following message:
Traceback (most recent call last):
File "<input>", line 3, in <module>
select({'A': {'B'}}), # <- Not properly handled!
File "~/datatest-project/datatest/validation.py", line 291, in __call__
raise err
datatest.ValidationError: does not satisfy mapping requirements (3 differences): {
'x': [Invalid(1)],
'y': [Invalid(2)],
'z': [Invalid(3)],
}
Datatest provides backwards compatibility via sub-packages in datatest.__past__
. Importing a sub-package will modify datatest's behavior by applying monkey patches.
Care needs to be taken to isolate the side-effects of these imports because their default behavior is to apply global state changes to datatest, itself.
Keep an eye on wesm/dataframe-protocol#1 and see if it makes sense to change datatest's normalization to support a DataFrame-protocol instead of Dataframe
s specifically.
Currently, as noted in the documentation, "To use DataTestCase, you must define subjectData".
This requirement made more sense before we had plans for magic removal/reduction (see #11). But as the assertions are more properly decoupled, users should have the ability to run DataTestCase methods without a baked-in subjectData source.
Add msg=None
argument to new allowances.
In cases where msg is None, but a filter function has a __doc__
, use the docstring as the message.
Currently DataError's init looks like the following:
def __init__(self, msg, differences, subject=None, required=None):
...
It would be nice if this were more flexible and behaved more like AssertionError, perhaps with the following signature:
def __init__(self, *args, **kwds):
...
Using the above scheme, args
could be walked over and any difference objects found could be put into the difference
property (before passing the remaining args to super().__init__
). The keywords could be added to a kwds
property or perhaps subject and required might be added explicitly.
Accept **fmtparams
keywords and pass them to the underlying csv.reader()
call:
subjectData = CsvReader('myfile.csv', delimiter='\t', quotechar='|')
API for allowPercentDeviation()
should allow for multiple signatures (like allowDeviation()
method):
allowPercentDeviation(tolerance, /, msg=None, **kwds_filter)
allowPercentDeviation(lower, upper, msg=None, **kwds_filter)
DataSource
raises a confusing error message when it fails to load data because of duplicate field names.
When data has multiple columns named "x":
ValueError: Duplicate values: x
When data has multiple columns where the name is blank:
ValueError: Duplicate values:
The error message should indicate that the "values" are actually field names. And in the case of blank field names, the error message should address this more clearly rather than trying to show a blank value.
The assertDataUnique()
method operates in-memory without optimizations (see issue #9).
As mentioned earlier (see comment), explore the idea of implementing a Bloom filter approach to solve larger-than-RAM testing for uniques.
It would be nice if running datatest with the -v
flag returns a data quality report that is appropriate for non-developers (currently, displays unittest style verbose output).
Validation is mishandled when data is a squint
query-mapping and requirement is a non-mapping object.
import datatest
import squint
select = squint.Select([
['A', 'B'],
['x', 'foo'],
['x', 'bar'],
['y', 'foo'],
['y', 'bar'],
])
selection = select({'A': 'B'}) # <- Query returns a mapping.
datatest.validate(selection, str) # <- Requirement is a non-mapping object.
The example above should pass but it fails with the following error:
$ python example.py
Traceback (most recent call last):
File "example.py", line 13, in <module>
datatest.validate(selection, str)
datatest.ValidationError: does not satisfy 'str' (2 differences): [
Invalid(('x', <Result object (evaltype=list) at 0x7fd43beb>)),
Invalid(('y', <Result object (evaltype=list) at 0x7fd43bed>)),
]
Create _compare...() functions to build collection of differences for mappings, sequences, sets, etc.
OK, this ticket is a big deal and should allow for all sorts of added flexibility.
1. Create a general comparison-wrapper that wraps instances of any type (set
, dict
, int
, str
, etc.).
2. Create an assertValid()
method that handles comparisons for arbitrary objects (uses comparison-wrapper internally). This should handle object-to-object, object-to-callable, and object-to-regex comparisons.
3. Remove the assertEqual()
wrapper method and use addTypeEqualityFunc()
to register a comparison function for the new wrapped type.
4. Add pass-through behavior to assertValid()
to replace existing magical methods (like assertSubjectSum()
, et al).
5. Remove all of the assertSubject...()
methods (reimplement them in the backward compatibility sub-package using assertValid()
).
[September 14th and October 6th edits, below]
After exploring different approaches for implementing this, I have settled on an alternate list of steps that allows for incremental progress and a simpler backward compatibility roadmap:
_compare...()
functions to build collection of differences for mappings, sequences, sets, etc. (see #24).assertValid()
method that wraps the _compare...()
functions (see #25).assertValid()
to replace the more magical helper methods like assertSubjectSum()
, et al (see #26).assertSubject...()
methods add them to the backward compatibility sub-package (see #27).assertEqual()
wrapper and add it to backward compatibility sub-package (see #28)._validate...()
functions for faster testing (short-circuit evaluation and Boolean return values) rather than using _compare...()
functions in all cases (see #29).Using assertEqual()
...
self.assertEqual(9, 10)
...returns a unittest-style traceback that reads:
Traceback (most recent call last):
File "<stdin>", line 3, in test_example
AssertionError: 9 != 10
Using assertValid()
...
self.assertValid(9, 10)
...should return a data-test style traceback:
Traceback (most recent call last):
File "<stdin>", line 3, in test_example
DataError: Deviation(-1, 10)
Currently the allow_only() context manager expects an iterable (usually a list
or dict
) of differences.
Add the ability to pass a single difference not wrapped in any container.
Remove all of the assertSubject...() methods from the core API and add them to the backward compatibility sub-package.
If an exception is thrown within a test that uses the test_db_engine fixture, the pytest_runtest_makereport function crashes.
The reason is that it uses Node's deprecated get_marker function, instead of the new get_closest_marker function.
See details about this change in pytest here:
https://docs.pytest.org/en/latest/mark.html#updating-code
Shaun,
I am trying your package to see if I can validate a csv file by reading it in pandas. I am getting Extra(nan) dt.validate.superset()
or Invalid(nan) dt.validate()
. Is there a way I can include those nan
in my validation sets?
Error looks like
E ValidationError: may contain only elements of given superset (10000 differences): [
Extra(nan),
Extra(nan),
Extra(nan),
Note: I am reading this particular column as str
E ValidationError: does not satisfy 'str' (10000 differences): [
Invalid(nan),
Invalid(nan),
Invalid(nan),
Invalid(nan),
Let me know if you find a solution or can help me debug
Issue #7 exposes the degree of magic that is currently present in the DataTestCase
methods. Removing (or at least reducing) magic where possible would make the behavior easier to understand and explain.
In cases where small amounts of magic are useful, methods should be renamed to better reflect what's happening.
This "magic" version:
def test_active(self):
self.assertDataSet('active', {'Y', 'N'})
...is roughly equivalent to:
def test_active(self):
subject = self.subject.set('active')
self.assertEqual(subject, {'Y', 'N'})
The magic version requires detailed knowledge about the method before a newcomer can guess what's happening. The later example is more explicit and easier to reason about.
Having said this, the magic versions of DataTestCase's methods can save a lot of typing. So what I plan to do is:
assertEqual()
integration (see issue #7) as well as other standard unittest methods (assertGreater()
, etc.).assertDataSum()
โ assertSubjectSum()
, etc.).Currently, comparison objects consider 0 == None
to be False
but the compare()
method considers this to be True
so this discrepancy is not returned in the list of differences.
Remove the assertEqual() wrapper and add it to backward compatibility sub-package.
Add how-to documentation for inequalities.
Should demonstrate:
validate.interval()
to create left- and right-bounded intervals (for greater-than-or-equal-to and less-than-or-equal-to)x < min or max < x
)It would be nice to have a command line option to disregard the @mandatory decorator when running tests.
I'm thinking -i
for "ignore" is a decent option:
-i, --ignore Ignore @mandatory flag and run all tests
Seeing the entire API on a single page can create confusion when users aren't familiar with the differences between unittest and pytest style conventions. It's not always apparent which method or function format is appropriate and mixing these different conventions is not advisable.
The API Reference page should be split into two sections:
Add DataTestCase.assertEqual()
to check CompareSet and CompareDict objects (in the same way that assertListEqual()
and assertSetEqual()
are used to test list and set objects).
This will allow for explicit comparison should the implicit subjectData magic prove inadequate in some situations.
Also look into support for assertGreater(), assertGreaterEqual(), assertLess(), and assertLessEqual().
When building a string for ValidationError
(with either __repr__()
or __str__()
), show differences in a sorted order.
The following should raise an error:
>>> import datatest
>>> select = datatest.Selector()
>>> select = select.load_data('nonexistent_file.csv')
The method DataTestCase.assertDataColumns()
should, optionally, accept a callable required argument:
def test_columns(self):
def lowercase(x): # <- Callable helper-function.
return x == x.lower()
self.assertDataColumns(lowercase)
Create an assertValid() method that wraps the _compare...() functions.
Currently, when a mandatory test fails, the test runner exists with the following:
$ python -m datatest
.......F
==========================================================
FAIL: <test name here>
<traceback here>
datatest.error.DataError: mandatory test failed, stopping
early: data does not satisfy object requirement:
Invalid('Some Invalid Code')
----------------------------------------------------------
Ran 8 tests in 1.764s
FAILED (failures=1)
The summary at the end should, more prominently, indicate that not all the tests were run:
$ python -m datatest
.......F
==========================================================
FAIL: test_<test name here>
<traceback here>
datatest.error.DataError: mandatory test: data does not
satisfy object requirement:
Invalid('Some Invalid Code')
----------------------------------------------------------
Ran 8 tests in 1.764s
MANDATORY TEST FAILED, STOPPING EARLY
Add conditional install instructions to .travis.yml for all optional dependencies (currently just pandas
and xlrd
).
Add how-to documentation for checking counts and cardinality.
Should demonstrate:
len(data)
to validate count of data elementscollections.Counter(data)
to validate counts per valueShould also mention that cardinality is a descriptive statistic that can be calculated with many other tools that a developer might use (df[0].applymap(bool).sum()
, select('A').filter().count()
, etc.).
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.