shawnbrown / datatest Goto Github PK

View Code? Open in Web Editor NEW

288.0 12.0 15.0 3.35 MB

Tools for test driven data-wrangling and data validation.

License: Other

Python 99.63% Batchfile 0.20% Shell 0.17%

data-wrangling testing unittest quality-assurance python pytest-plugin

datatest's People

Contributors

Stargazers

Watchers

Forkers

asfaltboy yochju wyomingoo chansonz oserttas-math dev4data obestwalter avshalomt2 drewdolan skols ztvysq ajhynes7 hixuchang lenapheno iq-scm

datatest's Issues

AcceptedExtra not working as expected with dicts

I expected with AcceptedExtra(): to ignore missing keys in dicts, but instead it raises a Deviation from None.

Here is an example:

actual = {'a': 1, 'b': 2}
expected = {'b': 2}
with AcceptedExtra():
    validate(actual, requirement=expected)

The output is:

E           ValidationError: does not satisfy mapping requirements (1 difference): {
                'a': Deviation(+1, None),
            }

Thanks for the cool package, by the way!

Explore ways to optimize validation and allowance flow.

Once major pieces are in place, explore ways of optimizing the validation/allowance process. Look to implement the following possible improvements:

Use lazy evaluation in validate and assertion functions by returning generators instead of fully calculated containers.
Create optimized _validate...() functions for faster testing (short-circuit evaluation and Boolean return values) rather than using _compare...() functions in all cases.

Crashes pytest-xdist processes (NOTE: See comments for fix.)

Hi, all! I've got some problem, when start my tests with pytest-xdist

MacOS(Also check in debian)
python 3.8.2

pytest==5.4.3
pytest-xdist==1.33.0
datatest==0.9.6

from datatest import accepted, Extra, validate as __validate


def test_should_passed():
    with accepted(Extra):
        __validate({"qwe": 1}, {"qwe": 1}, "")


def test_should_failed():
    with accepted(Extra):
        __validate({"qwe": 1}, {"qwe": 2}, "")


if __name__ == '__main__':
    import sys, pytest
    sys.exit(pytest.main(['/Users/qa/PycharmProjects/qa/test123.py', '-vvv', '-n', '1', '-s']))

Output:

test123.py::test_should_passed 
[gw0] PASSED test123.py::test_should_passed 
test123.py::test_should_failed !!!!!!!!!!!!!!!!!!!! <ExceptionInfo RuntimeError('\'----------------------------------------------------------------------------------------------------\'.../issues\'\n\'----------------------------------------------------------------------------------------------------\'\n') tblen=14>

INTERNALERROR> Traceback (most recent call last):
INTERNALERROR>   File "/Users/qa/PycharmProjects/qa/venv/lib/python3.8/site-packages/xdist/workermanage.py", line 334, in process_from_remote
INTERNALERROR>     rep = self.config.hook.pytest_report_from_serializable(
INTERNALERROR>   File "/Users/qa/PycharmProjects/qa/venv/lib/python3.8/site-packages/pluggy/hooks.py", line 286, in __call__
INTERNALERROR>     return self._hookexec(self, self.get_hookimpls(), kwargs)
INTERNALERROR>   File "/Users/qa/PycharmProjects/qa/venv/lib/python3.8/site-packages/pluggy/manager.py", line 93, in _hookexec
INTERNALERROR>     return self._inner_hookexec(hook, methods, kwargs)
INTERNALERROR>   File "/Users/qa/PycharmProjects/qa/venv/lib/python3.8/site-packages/pluggy/manager.py", line 84, in <lambda>
INTERNALERROR>     self._inner_hookexec = lambda hook, methods, kwargs: hook.multicall(
INTERNALERROR>   File "/Users/qa/PycharmProjects/qa/venv/lib/python3.8/site-packages/pluggy/callers.py", line 208, in _multicall
INTERNALERROR>     return outcome.get_result()
INTERNALERROR>   File "/Users/qa/PycharmProjects/qa/venv/lib/python3.8/site-packages/pluggy/callers.py", line 80, in get_result
INTERNALERROR>     raise ex[1].with_traceback(ex[2])
INTERNALERROR>   File "/Users/qa/PycharmProjects/qa/venv/lib/python3.8/site-packages/pluggy/callers.py", line 187, in _multicall
INTERNALERROR>     res = hook_impl.function(*args)
INTERNALERROR>   File "/Users/qa/PycharmProjects/qa/venv/lib/python3.8/site-packages/_pytest/reports.py", line 355, in pytest_report_from_serializable
INTERNALERROR>     return TestReport._from_json(data)
INTERNALERROR>   File "/Users/qa/PycharmProjects/qa/venv/lib/python3.8/site-packages/_pytest/reports.py", line 193, in _from_json
INTERNALERROR>     kwargs = _report_kwargs_from_json(reportdict)
INTERNALERROR>   File "/Users/qa/PycharmProjects/qa/venv/lib/python3.8/site-packages/_pytest/reports.py", line 485, in _report_kwargs_from_json
INTERNALERROR>     reprtraceback = deserialize_repr_traceback(
INTERNALERROR>   File "/Users/qa/PycharmProjects/qa/venv/lib/python3.8/site-packages/_pytest/reports.py", line 468, in deserialize_repr_traceback
INTERNALERROR>     repr_traceback_dict["reprentries"] = [
INTERNALERROR>   File "/Users/qa/PycharmProjects/qa/venv/lib/python3.8/site-packages/_pytest/reports.py", line 469, in <listcomp>
INTERNALERROR>     deserialize_repr_entry(x) for x in repr_traceback_dict["reprentries"]
INTERNALERROR>   File "/Users/qa/PycharmProjects/qa/venv/lib/python3.8/site-packages/_pytest/reports.py", line 464, in deserialize_repr_entry
INTERNALERROR>     _report_unserialization_failure(entry_type, TestReport, reportdict)
INTERNALERROR>   File "/Users/qa/PycharmProjects/qa/venv/lib/python3.8/site-packages/_pytest/reports.py", line 206, in _report_unserialization_failure
INTERNALERROR>     raise RuntimeError(stream.getvalue())
INTERNALERROR> RuntimeError: '----------------------------------------------------------------------------------------------------'
INTERNALERROR> 'INTERNALERROR: Unknown entry type returned: DatatestReprEntry'
INTERNALERROR> "report_name: <class '_pytest.reports.TestReport'>"
INTERNALERROR> {'$report_type': 'TestReport',
INTERNALERROR>  'duration': 0.002020120620727539,
INTERNALERROR>  'item_index': 1,
INTERNALERROR>  'keywords': {'qa': 1, 'test123.py': 1, 'test_should_failed': 1},
INTERNALERROR>  'location': ('test123.py', 8, 'test_should_failed'),
INTERNALERROR>  'longrepr': {'chain': [({'extraline': None,
INTERNALERROR>                           'reprentries': [{'data': {'lines': ['    def '
INTERNALERROR>                                                               'test_should_failed():',
INTERNALERROR>                                                               '        with '
INTERNALERROR>                                                               'accepted(Extra):',
INTERNALERROR>                                                               '>           '
INTERNALERROR>                                                               '__validate({"qwe": '
INTERNALERROR>                                                               '1}, {"qwe": 2}, '
INTERNALERROR>                                                               '"")',
INTERNALERROR>                                                               'E           '
INTERNALERROR>                                                               'datatest.ValidationError: '
INTERNALERROR>                                                               'does not '
INTERNALERROR>                                                               'satisfy 2 (1 '
INTERNALERROR>                                                               'difference): {',
INTERNALERROR>                                                               'E               '
INTERNALERROR>                                                               "'qwe': "
INTERNALERROR>                                                               'Deviation(-1, '
INTERNALERROR>                                                               '2),',
INTERNALERROR>                                                               'E           }'],
INTERNALERROR>                                                     'reprfileloc': {'lineno': 11,
INTERNALERROR>                                                                     'message': 'ValidationError',
INTERNALERROR>                                                                     'path': 'test123.py'},
INTERNALERROR>                                                     'reprfuncargs': {'args': []},
INTERNALERROR>                                                     'reprlocals': None,
INTERNALERROR>                                                     'style': 'long'},
INTERNALERROR>                                            'type': 'DatatestReprEntry'}],
INTERNALERROR>                           'style': 'long'},
INTERNALERROR>                          {'lineno': 11,
INTERNALERROR>                           'message': 'datatest.ValidationError: does not '
INTERNALERROR>                                      'satisfy 2 (1 difference): {\n'
INTERNALERROR>                                      "    'qwe': Deviation(-1, 2),\n"
INTERNALERROR>                                      '}',
INTERNALERROR>                           'path': '/Users/qa/PycharmProjects/qa/test123.py'},
INTERNALERROR>                          None)],
INTERNALERROR>               'reprcrash': {'lineno': 11,
INTERNALERROR>                             'message': 'datatest.ValidationError: does not '
INTERNALERROR>                                        'satisfy 2 (1 difference): {\n'
INTERNALERROR>                                        "    'qwe': Deviation(-1, 2),\n"
INTERNALERROR>                                        '}',
INTERNALERROR>                             'path': '/Users/qa/PycharmProjects/qa/test123.py'},
INTERNALERROR>               'reprtraceback': {'extraline': None,
INTERNALERROR>                                 'reprentries': [{'data': {'lines': ['    def '
INTERNALERROR>                                                                     'test_should_failed():',
INTERNALERROR>                                                                     '        '
INTERNALERROR>                                                                     'with '
INTERNALERROR>                                                                     'accepted(Extra):',
INTERNALERROR>                                                                     '>           '
INTERNALERROR>                                                                     '__validate({"qwe": '
INTERNALERROR>                                                                     '1}, '
INTERNALERROR>                                                                     '{"qwe": '
INTERNALERROR>                                                                     '2}, "")',
INTERNALERROR>                                                                     'E           '
INTERNALERROR>                                                                     'datatest.ValidationError: '
INTERNALERROR>                                                                     'does not '
INTERNALERROR>                                                                     'satisfy 2 '
INTERNALERROR>                                                                     '(1 '
INTERNALERROR>                                                                     'difference): '
INTERNALERROR>                                                                     '{',
INTERNALERROR>                                                                     'E               '
INTERNALERROR>                                                                     "'qwe': "
INTERNALERROR>                                                                     'Deviation(-1, '
INTERNALERROR>                                                                     '2),',
INTERNALERROR>                                                                     'E           '
INTERNALERROR>                                                                     '}'],
INTERNALERROR>                                                           'reprfileloc': {'lineno': 11,
INTERNALERROR>                                                                           'message': 'ValidationError',
INTERNALERROR>                                                                           'path': 'test123.py'},
INTERNALERROR>                                                           'reprfuncargs': {'args': []},
INTERNALERROR>                                                           'reprlocals': None,
INTERNALERROR>                                                           'style': 'long'},
INTERNALERROR>                                                  'type': 'DatatestReprEntry'}],
INTERNALERROR>                                 'style': 'long'},
INTERNALERROR>               'sections': []},
INTERNALERROR>  'nodeid': 'test123.py::test_should_failed',
INTERNALERROR>  'outcome': 'failed',
INTERNALERROR>  'sections': [],
INTERNALERROR>  'testrun_uid': 'c913bf205a874a50a237dcf40d482d06',
INTERNALERROR>  'user_properties': [],
INTERNALERROR>  'when': 'call',
INTERNALERROR>  'worker_id': 'gw0'}
INTERNALERROR> 'Please report this bug at https://github.com/pytest-dev/pytest/issues'
INTERNALERROR> '----------------------------------------------------------------------------------------------------'
[gw0] node down: <ExceptionInfo RuntimeError('\'----------------------------------------------------------------------------------------------------\'.../issues\'\n\'----------------------------------------------------------------------------------------------------\'\n') tblen=14>
[gw0] FAILED test123.py::test_should_failed 

replacing crashed worker gw0
[gw1] darwin Python 3.8.3 cwd: /Users/qa/PycharmProjects/qa
INTERNALERROR> Traceback (most recent call last):
INTERNALERROR>   File "/Users/qa/PycharmProjects/qa/venv/lib/python3.8/site-packages/_pytest/main.py", line 191, in wrap_session
INTERNALERROR>     session.exitstatus = doit(config, session) or 0
INTERNALERROR>   File "/Users/qa/PycharmProjects/qa/venv/lib/python3.8/site-packages/_pytest/main.py", line 247, in _main
INTERNALERROR>     config.hook.pytest_runtestloop(session=session)
INTERNALERROR>   File "/Users/qa/PycharmProjects/qa/venv/lib/python3.8/site-packages/pluggy/hooks.py", line 286, in __call__
INTERNALERROR>     return self._hookexec(self, self.get_hookimpls(), kwargs)
INTERNALERROR>   File "/Users/qa/PycharmProjects/qa/venv/lib/python3.8/site-packages/pluggy/manager.py", line 93, in _hookexec
INTERNALERROR>     return self._inner_hookexec(hook, methods, kwargs)
INTERNALERROR>   File "/Users/qa/PycharmProjects/qa/venv/lib/python3.8/site-packages/pluggy/manager.py", line 84, in <lambda>
INTERNALERROR>     self._inner_hookexec = lambda hook, methods, kwargs: hook.multicall(
INTERNALERROR>   File "/Users/qa/PycharmProjects/qa/venv/lib/python3.8/site-packages/pluggy/callers.py", line 208, in _multicall
INTERNALERROR>     return outcome.get_result()
INTERNALERROR>   File "/Users/qa/PycharmProjects/qa/venv/lib/python3.8/site-packages/pluggy/callers.py", line 80, in get_result
INTERNALERROR>     raise ex[1].with_traceback(ex[2])
INTERNALERROR>   File "/Users/qa/PycharmProjects/qa/venv/lib/python3.8/site-packages/pluggy/callers.py", line 187, in _multicall
INTERNALERROR>     res = hook_impl.function(*args)
INTERNALERROR>   File "/Users/qa/PycharmProjects/qa/venv/lib/python3.8/site-packages/xdist/dsession.py", line 112, in pytest_runtestloop
INTERNALERROR>     self.loop_once()
INTERNALERROR>   File "/Users/qa/PycharmProjects/qa/venv/lib/python3.8/site-packages/xdist/dsession.py", line 135, in loop_once
INTERNALERROR>     call(**kwargs)
INTERNALERROR>   File "/Users/qa/PycharmProjects/qa/venv/lib/python3.8/site-packages/xdist/dsession.py", line 263, in worker_runtest_protocol_complete
INTERNALERROR>     self.sched.mark_test_complete(node, item_index, duration)
INTERNALERROR>   File "/Users/qa/PycharmProjects/qa/venv/lib/python3.8/site-packages/xdist/scheduler/load.py", line 151, in mark_test_complete
INTERNALERROR>     self.node2pending[node].remove(item_index)
INTERNALERROR> KeyError: <WorkerController gw0>

But if I change second test like this, all works fine:

def test_should_failed():
    try:
        with accepted(Extra):
            __validate({"qwe": 1}, {"qwe": 2}, "")
    except:
        raise ValueError

I don't know exactly where i should create bug\issue about this :)

Change get_reader.from_excel() to accept keyword arguments

Change get_reader.from_excel() to accept keyword arguments that are passed on to the call to xlrd.open_workbook() (see line 325 in get_reader.py).

Inheriting from unittest2.

It would be nice to inherit from unittest2 when it's available. This will provide a clean way to support setUpModule() and setUpClass() on Python 3.1 and 2.6 as well as other features of the package.

Add full docstrings for new allowances.

Add docstrings for new allowances--for reference, use previous allowance's docstrings (link).

Working with large pandas DataFrames is not as fast as it could be.

The PandasSource class is currently a minimal implementation.

Need to optimize the following methods:

filter_rows()
distinct()
sum()
count()
mapreduce()

Allowance arguments are inconsistent.

Need to normalize the API and fix msg handling for all of the allow... context managers.

Add Travis CI Support

For completely local testing, this project provides its own ./run-tests.sh script (run-tests.bat on Windows) but adding Travis CI testing would be a nice touch.

Unique Method

Hey Shawn - one of the problems you were speaking about at PyCon 2016 was looking to guarantee that all integers in a list were unique, in an efficient way for large sets of data?

MultiSource should fail if no sources are given.

Calling MultiSource() without providing any sub-sources should raise an error. Allowing empty MultiSource objects can cause confusing error messages.

Create allowances for new mapping differences.

The new validate function can compare mappings of data and when differences are found, they are returned as mappings of differences. All of the assert functions must be updated to handle this new behavior.

Add "notch filter" capability to deviation allowances.

Currently, the deviation allowances (allowDeviation and allowPercentDeviation) can only be set to ranges around zero (asserts lower <= 0 <= upper). This should be changed to allow deviations from a given range even when the entire range is above or below zero.

The lower and upper arguments should also work when set to the exact same number:

with self.allowPercentDeviation(lower=-1.0, upper=-1.0):  # Allows only 100% 
    self.assertSubjectSum('C', ['A', 'B'])                # deviations but no
                                                          # others.

Difference repr's don't always return eval-compatible strings.

Currently, the __repr__() for difference objects returns keyword-style formatting even when **kwds keys are not valid identifiers:

>>> kwds = {'foo': 'AAA', 'bar baz': 'BBB'}
>>> datatest.Deviation(1, 12, **kwds)
Deviation(+1, 12, bar baz='BBB', foo='AAA')  # <- Can NOT be eval'd!

When keys are not valid identifiers, then dict-style format should be used:

>>> datatest.Deviation(1, 12, **kwds)
Deviation(+1, 12, **{'foo': 'AAA', 'bar baz': 'BBB'})  # <- CAN be eval'd!

Simplified DataSource Loading.

Simplify DataSource() loading behavior. The default __init__() method should accept a variety of formats (CSV, records, XLSX, etc.) and handle the data loading appropriately.

Fully Composable Allowances.

Make all allowance types fully composable with each other using binary operators & and |.

Add pass-through behavior to assertValid() method.

Add pass-through behavior to assertValid() to help replace functionality that will be lost when removing the more magical helper methods (like assertSubjectSum(), et al).

Using the existing data/required syntax can be cumbersome in certain cases. By adding support for an optional calling convention (a function signature), duplication can be reduced. While the data/required signature will remain the default behavior (which will appear in the docstring when introspected) a second, optional signature will also be supported:

    TestCase.assertValid(data, required, msg=None)
    TestCase.assertValid(function, /, msg=None)

Deprecate Old Allowances and Rename New Implementations.

Move old implementations to backward compatibility sub-package, rename new implementations, and update wrapper methods.

Squint objects not handled properly when used as requirements.

Squint objects are not being evaluated properly by datatest.validate() function:

import datatest
import squint

# Create a Select object.
select = squint.Select([['A', 'B'], ['x', 1], ['y', 2], ['z', 3]])

# Compare data to itself--passes as expected.
datatest.validate(
	select({'A': {'B'}}),
	select({'A': {'B'}}).fetch(),  # <- Shouldn't be necessary.
)

# Compare data to itself--fails, unexpectedly.
datatest.validate(
	select({'A': {'B'}}),
	select({'A': {'B'}}),  # <- Not properly handled!
)

In the code above, the second call to datatest.validate() should pass but, instead, fails with the following message:

Traceback (most recent call last):
  File "<input>", line 3, in <module>
	select({'A': {'B'}}),  # <- Not properly handled!
  File "~/datatest-project/datatest/validation.py", line 291, in __call__
	raise err
datatest.ValidationError: does not satisfy mapping requirements (3 differences): {
	'x': [Invalid(1)],
	'y': [Invalid(2)],
	'z': [Invalid(3)],
}

Isolated Tests for Backwards Compatibility Imports

Datatest provides backwards compatibility via sub-packages in datatest.__past__. Importing a sub-package will modify datatest's behavior by applying monkey patches.

Care needs to be taken to isolate the side-effects of these imports because their default behavior is to apply global state changes to datatest, itself.

Investigate Support for DataFrame-Protocol

Keep an eye on wesm/dataframe-protocol#1 and see if it makes sense to change datatest's normalization to support a DataFrame-protocol instead of Dataframes specifically.

Remove Subject Data Requirement

Currently, as noted in the documentation, "To use DataTestCase, you must define subjectData".

This requirement made more sense before we had plans for magic removal/reduction (see #11). But as the assertions are more properly decoupled, users should have the ability to run DataTestCase methods without a baked-in subjectData source.

Add "msg" argument to new allowances.

Add msg=None argument to new allowances.

In cases where msg is None, but a filter function has a __doc__, use the docstring as the message.

Clean up DataError internals.

Currently DataError's init looks like the following:

def __init__(self, msg, differences, subject=None, required=None):
    ...

It would be nice if this were more flexible and behaved more like AssertionError, perhaps with the following signature:

def __init__(self, *args, **kwds):
    ...

Using the above scheme, args could be walked over and any difference objects found could be put into the difference property (before passing the remaining args to super().__init__). The keywords could be added to a kwds property or perhaps subject and required might be added explicitly.

CSV handling should allow standard formatting parameters.

Accept **fmtparams keywords and pass them to the underlying csv.reader() call:

subjectData = CsvReader('myfile.csv', delimiter='\t', quotechar='|')

allowPercentDeviation() should implement multi-signature.

API for allowPercentDeviation() should allow for multiple signatures (like allowDeviation() method):

allowPercentDeviation(tolerance, /, msg=None, **kwds_filter)
allowPercentDeviation(lower, upper, msg=None, **kwds_filter)

Duplicate Field Name Message

DataSource raises a confusing error message when it fails to load data because of duplicate field names.

When data has multiple columns named "x":

ValueError: Duplicate values: x

When data has multiple columns where the name is blank:

ValueError: Duplicate values:

The error message should indicate that the "values" are actually field names. And in the case of blank field names, the error message should address this more clearly rather than trying to show a blank value.

Optimize assertSubjectUnique() method.

The assertDataUnique() method operates in-memory without optimizations (see issue #9).

As mentioned earlier (see comment), explore the idea of implementing a Bloom filter approach to solve larger-than-RAM testing for uniques.

Make verbose output (-v) display a readable report

It would be nice if running datatest with the -v flag returns a data quality report that is appropriate for non-developers (currently, displays unittest style verbose output).

Squint nested-mapping queries not handled properly with non-mapping requirements.

Validation is mishandled when data is a squint query-mapping and requirement is a non-mapping object.

import datatest
import squint

select = squint.Select([
    ['A', 'B'],
    ['x', 'foo'],
    ['x', 'bar'],
    ['y', 'foo'],
    ['y', 'bar'],
])

selection = select({'A': 'B'})  # <- Query returns a mapping.
datatest.validate(selection, str)  # <- Requirement is a non-mapping object.

The example above should pass but it fails with the following error:

$ python example.py
Traceback (most recent call last):
  File "example.py", line 13, in <module>
    datatest.validate(selection, str)
datatest.ValidationError: does not satisfy 'str' (2 differences): [
    Invalid(('x', <Result object (evaltype=list) at 0x7fd43beb>)),
    Invalid(('y', <Result object (evaltype=list) at 0x7fd43bed>)),
]

Create _compare...() functions.

Create _compare...() functions to build collection of differences for mappings, sequences, sets, etc.

More Magic Removal (assertValid(), etc.)

OK, this ticket is a big deal and should allow for all sorts of added flexibility.

~~1. Create a general comparison-wrapper that wraps instances of any type (set, dict, int, str, etc.).~~
2. Create an assertValid() method that handles comparisons for arbitrary objects (uses comparison-wrapper internally). This should handle object-to-object, object-to-callable, and object-to-regex comparisons.
~~3. Remove the assertEqual() wrapper method and use addTypeEqualityFunc() to register a comparison function for the new wrapped type.~~
~~4. Add pass-through behavior to assertValid() to replace existing magical methods (like assertSubjectSum(), et al).~~
~~5. Remove all of the assertSubject...() methods (reimplement them in the backward compatibility sub-package using assertValid()).~~

[September 14th and October 6th edits, below]

After exploring different approaches for implementing this, I have settled on an alternate list of steps that allows for incremental progress and a simpler backward compatibility roadmap:

Create _compare...() functions to build collection of differences for mappings, sequences, sets, etc. (see #24).
Create an assertValid() method that wraps the _compare...() functions (see #25).
Create allowances to handle new mapping differences (see #23).
Add pass-through behavior to assertValid() to replace the more magical helper methods like assertSubjectSum(), et al (see #26).
Remove all of the assertSubject...() methods add them to the backward compatibility sub-package (see #27).
Remove the assertEqual() wrapper and add it to backward compatibility sub-package (see #28).
Update allowance wrappers and class names, move old implementations to backward compatibility sub-package (see #30).
Explore the possibility of optimized _validate...() functions for faster testing (short-circuit evaluation and Boolean return values) rather than using _compare...() functions in all cases (see #29).

Using assertEqual()...

self.assertEqual(9, 10)

...returns a unittest-style traceback that reads:

Traceback (most recent call last):
  File "<stdin>", line 3, in test_example
AssertionError: 9 != 10

Using assertValid()...

self.assertValid(9, 10)

...should return a data-test style traceback:

Traceback (most recent call last):
  File "<stdin>", line 3, in test_example
DataError: Deviation(-1, 10)

Add Single-Difference Support to Allowances.

Currently the allow_only() context manager expects an iterable (usually a list or dict) of differences.

Add the ability to pass a single difference not wrapped in any container.

Remove assertSubject...() methods.

Remove all of the assertSubject...() methods from the core API and add them to the backward compatibility sub-package.

pytest_runtest_makereport crashes on test exceptions

If an exception is thrown within a test that uses the test_db_engine fixture, the pytest_runtest_makereport function crashes.
The reason is that it uses Node's deprecated get_marker function, instead of the new get_closest_marker function.
See details about this change in pytest here:
https://docs.pytest.org/en/latest/mark.html#updating-code

validation errors Extra(nan) or Invalid(nan)

Shaun,
I am trying your package to see if I can validate a csv file by reading it in pandas. I am getting Extra(nan) dt.validate.superset() or Invalid(nan) dt.validate() . Is there a way I can include those nan in my validation sets?

Error looks like

E     ValidationError: may contain only elements of given superset (10000 differences): [
            Extra(nan),
            Extra(nan),
            Extra(nan),

Note: I am reading this particular column as str

E       ValidationError: does not satisfy 'str' (10000 differences): [
            Invalid(nan),
            Invalid(nan),
            Invalid(nan),
            Invalid(nan),

Let me know if you find a solution or can help me debug

Magic Reduction

Issue #7 exposes the degree of magic that is currently present in the DataTestCase methods. Removing (or at least reducing) magic where possible would make the behavior easier to understand and explain.

In cases where small amounts of magic are useful, methods should be renamed to better reflect what's happening.

Illustrating the Problem

This "magic" version:

def test_active(self):
    self.assertDataSet('active', {'Y', 'N'})

...is roughly equivalent to:

def test_active(self):
    subject = self.subject.set('active')
    self.assertEqual(subject, {'Y', 'N'})

The magic version requires detailed knowledge about the method before a newcomer can guess what's happening. The later example is more explicit and easier to reason about.

Having said this, the magic versions of DataTestCase's methods can save a lot of typing. So what I plan to do is:

Fully implement assertEqual() integration (see issue #7) as well as other standard unittest methods (assertGreater(), etc.).
Rename the existing methods to clearly denote that they run on the subject data (e.g., assertDataSum() → assertSubjectSum(), etc.).

Make None-vs-zero handling consistent

Currently, comparison objects consider 0 == None to be False but the compare() method considers this to be True so this discrepancy is not returned in the list of differences.

Remove assertEqual() wrapper method.

Remove the assertEqual() wrapper and add it to backward compatibility sub-package.

Add "How to Validate Inequalities" documentation.

Add how-to documentation for inequalities.

Should demonstrate:

using validate.interval() to create left- and right-bounded intervals (for greater-than-or-equal-to and less-than-or-equal-to)
using functions for implementing greater-than or less than (but not equal-to)
and maybe interval disjunction (e.g. a combined x < min or max < x)

Option to ignore @mandatory decorator.

It would be nice to have a command line option to disregard the @mandatory decorator when running tests.

I'm thinking -i for "ignore" is a decent option:

  -i, --ignore          Ignore @mandatory flag and run all tests

Split Docs into Unittest- and Pytest-style Pages

Seeing the entire API on a single page can create confusion when users aren't familiar with the differences between unittest and pytest style conventions. It's not always apparent which method or function format is appropriate and mixing these different conventions is not advisable.

The API Reference page should be split into two sections:

Unittest-style Syntax
Pytest-style Syntax

Add assertEqual() method.

Add DataTestCase.assertEqual() to check CompareSet and CompareDict objects (in the same way that assertListEqual() and assertSetEqual() are used to test list and set objects).

This will allow for explicit comparison should the implicit subjectData magic prove inadequate in some situations.

Also look into support for assertGreater(), assertGreaterEqual(), assertLess(), and assertLessEqual().

Sorted ValidationError differences.

When building a string for ValidationError (with either __repr__() or __str__()), show differences in a sorted order.

Selector.load_data() silently fails on missing file.

The following should raise an error:

>>> import datatest
>>> select = datatest.Selector()
>>> select = select.load_data('nonexistent_file.csv')

DataTestCase.assertDataColumns() should accept callable.

The method DataTestCase.assertDataColumns() should, optionally, accept a callable required argument:

def test_columns(self):
    def lowercase(x):  # <- Callable helper-function.
        return x == x.lower()
    self.assertDataColumns(lowercase)

Create assertValid() method.

Create an assertValid() method that wraps the _compare...() functions.

Make Mandatory Test Failure More Prominent

Currently, when a mandatory test fails, the test runner exists with the following:

$ python -m datatest 
.......F
==========================================================
FAIL: <test name here>
<traceback here>
datatest.error.DataError: mandatory test failed, stopping 
early: data does not satisfy object requirement:
 Invalid('Some Invalid Code')
----------------------------------------------------------
Ran 8 tests in 1.764s

FAILED (failures=1)

The summary at the end should, more prominently, indicate that not all the tests were run:

$ python -m datatest 
.......F
==========================================================
FAIL: test_<test name here>
<traceback here>
datatest.error.DataError: mandatory test: data does not 
satisfy object requirement:
 Invalid('Some Invalid Code')
----------------------------------------------------------
Ran 8 tests in 1.764s

MANDATORY TEST FAILED, STOPPING EARLY

Travis CI Additions

Add conditional install instructions to .travis.yml for all optional dependencies (currently just pandas and xlrd).

Add "How to Validate Counts and Cardinality" documentation.

Add how-to documentation for checking counts and cardinality.

Should demonstrate:

using len(data) to validate count of data elements
using collections.Counter(data) to validate counts per value

Should also mention that cardinality is a descriptive statistic that can be calculated with many other tools that a developer might use (df[0].applymap(bool).sum(), select('A').filter().count(), etc.).