alleninstitute / argschema Goto Github PK

This python module simplifies the development of modules that would like to define and check a particular set of input parameters, but be able to flexibly define those inputs in different ways in different contexts.

License: Other

Python 99.79% Dockerfile 0.21%

argschema's People

Contributors

Stargazers

Watchers

Forkers

russtorres samrkinn gitter-badger dyf sriharivignesh nicain nilegraddis djkapner nickponvert matthewaitken tmchartrand rhytnen

argschema's Issues

Question about argschema logger

I'm finding argschema super useful! But I've had one question/issue with regards to logging come up.

Because the argschema_parser is one of the first things called in most main scripts I was wondering if it was possible to prevent argschema from instantiating a logger? I couldn't find anything about it in the documentation pages.

The argschema logger caused another logger that I instantiated later in my main with basicConfig() to fail silently.
(see: https://docs.python.org/2/library/logging.html#logging.basicConfig)

I unfortunately, couldn't move my logger instantiation before invoking the argschema parser since the log filename and some other relevant variables needed to be obtained from argschema.

I was eventually able to find this workaround that worked:

# Remove root handler instantiated by argschema
for handler in logging.root.handlers[:]:
    logging.root.removeHandler(handler)

Do let me know if there's a better way for me to handle the issue. I'm very new to using both the logger and argschema...

validate_input_path fails on windows (reported) for binary reads

These lines
are reported to cause problems on windows with files that need open(fname, "rb") (for example, .npy, .h5). Seems like we can fix this easily and add a test.

smart_merge fails with NumpyArray

I'm getting an error on validation when I've added a NumpyArray to an existing schema.

  File "/allen/programs/celltypes/workgroups/em-connectomics/danielk/conda/rm_production_mod/lib/python2.7/site-packages/argschema/argschema_parser.py", line 171, in __init__
    args = utils.smart_merge(jsonargs, argsdict)
  File "/allen/programs/celltypes/workgroups/em-connectomics/danielk/conda/rm_production_mod/lib/python2.7/site-packages/argschema/utils.py", line 210, in smart_merge
    elif a[key] == b[key]:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

my schema has this new entry:

corner_mask_radii = NumpyArray(
     dtype = np.int, 
     required=False,
     default=[0, 0, 0, 0],
     missing=[0, 0, 0, 0],
     description="radius of image mask corners, "
     "order (0, 0), (w, 0), (w, h), (0, h)")

argschema/argschema/utils.py

Line 210 in 2874549

elif a[key] == b[key]:

command line overrides not working for all field types in all version of python

see PR #71 for example. This is due the the FIELD_TYPE_MAPPING being one>many so when it's inverted it's many>one and some of the field_types aren't valid argparse parsing functions in python3 (i.e. bytes).

The solution is to likely systematically test all the field types and their command line overrides and make a mapping that makes sense in a python version specific manner.

test_inputdir_no_access() test/fields/test_files.py fails on Windows

this had been working on appveyor, and, in moving to github-actions the test no longer passes.
We'd like to re-instate this test for good windows coverage.

Having NumpyArray in schema results in warning logs about invalid type.

NumpyArray inherits from marshmallow.fields.List, and sets the list type as marshmallow.fields.Field. When the argument parser is being built, the list handling logs a warning if it can't find a type to pass to argparse which means that all NumpyArrays result in a warning since Field isn't in the type map.

Command line overrides for boolean arguments will always result in a value of True

The command line options for booleans expect an argument like any other type (they are not store_true or store_false flags). Right now the type specification given to the argument parser for a boolean is bool. This results in the parser evaluating the argument by simply doing bool(value), which always results to true since calling bool on any non-empty string evaluates to True.

allow short name alternatives for setting fields from command line

Seems like it would be fairly straightforward to add the option of setting short names for fields to make them easier to specify from the command line ("-i" vs "--input"), unless I'm missing something and this is already possible?
If this would make sense to include I'd be glad to test something out and file a PR at some point.

Marshmallow 3.0.0 just released and it breaks argschema

i just installed version 3.0.0 and it breaks argschema. After downgrading to 2.20.1 it works fine.

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-36-b32a4e809482> in <module>
      1 # compute transformation (only need to run once)
----> 2 s2 = s3.Solve3D(input_data=s3.example2, args=[])
      3 # s2.run()

/usr/local/lib/python3.6/dist-packages/argschema/argschema_parser.py in __init__(self, input_data, schema_type, output_schema_type, args, logger_name)
    173 
    174         # validate with load!
--> 175         result = self.load_schema_with_defaults(self.schema, args)
    176 
    177         self.args = result

/usr/local/lib/python3.6/dist-packages/argschema/argschema_parser.py in load_schema_with_defaults(self, schema, args)
    272 
    273         # load the dictionary via the schema
--> 274         result = utils.load(schema, args)
    275 
    276         return result

/usr/local/lib/python3.6/dist-packages/argschema/utils.py in load(schema, d)
    416     """
    417 
--> 418     results = schema.load(d)
    419     if isinstance(results, tuple):
    420         (results, errors) = results

/usr/local/lib/python3.6/dist-packages/marshmallow/schema.py in load(self, data, many, partial, unknown)
    682         """
    683         return self._do_load(
--> 684             data, many=many, partial=partial, unknown=unknown, postprocess=True
    685         )
    686 

/usr/local/lib/python3.6/dist-packages/marshmallow/schema.py in _do_load(self, data, many, partial, unknown, postprocess)
    783             try:
    784                 processed_data = self._invoke_load_processors(
--> 785                     PRE_LOAD, data, many=many, original_data=data, partial=partial
    786                 )
    787             except ValidationError as err:

/usr/local/lib/python3.6/dist-packages/marshmallow/schema.py in _invoke_load_processors(self, tag, data, many, original_data, partial)
   1012             many=many,
   1013             original_data=original_data,
-> 1014             partial=partial,
   1015         )
   1016         return data

/usr/local/lib/python3.6/dist-packages/marshmallow/schema.py in _invoke_processors(self, tag, pass_many, data, many, original_data, **kwargs)
   1133                     data = processor(data, original_data, many=many, **kwargs)
   1134                 else:
-> 1135                     data = processor(data, many=many, **kwargs)
   1136         return data
   1137 

TypeError: make_object() got an unexpected keyword argument 'many'

better argparse sorting/description

Argparse arguments currently stay together based upon nested level, but their ordering within a nested level remain somewhat random. I'd suggest that non-required arguments get pushed to the bottom, and Nested arguments get pushed to the top.

so if i had

    import argschema

    class NestedSchema(argschema.schemas.DefaultSchema):
        aint = argschema.fields.Int(required=False, default =5, description="a integer")
        bstr = argschema.fields.Str(required=True, description = "a string")

    class MySchema(argschema.ArgSchema):
        nest = argschema.fields.Nested(NestedSchema, required=True, description="nested schema")
        topint = argschema.fields.Int(required=True,description="a top integer")
        topintb = argschema.fields.Int(default=5,required=False,description="a top integer b")
        topintc = argschema.fields.Int(required=True,description="a top integer c")
        topintd = argschema.fields.Int(default=7,required=False,description="a top integer d")

    if __name__ == '__main__':
        mod = argschema.ArgSchemaParser(schema_type=MySchema)

presently i get

    $ python test_schema.py --help
    usage: test_schema.py [-h] [--log_level LOG_LEVEL] [--nest.aint NEST.AINT]
                        [--nest.bstr NEST.BSTR] [--input_json INPUT_JSON]
                        [--topintc TOPINTC] [--topintb TOPINTB]
                        [--topintd TOPINTD] [--topint TOPINT]
                        [--output_json OUTPUT_JSON]

    optional arguments:
    -h, --help            show this help message and exit
    --log_level LOG_LEVEL
                            set the logging level of the module
    --nest.aint NEST.AINT
                            a integer
    --nest.bstr NEST.BSTR
                            a string
    --input_json INPUT_JSON
                            file path of input json file
    --topintc TOPINTC     a top integer c
    --topintb TOPINTB     a top integer b
    --topintd TOPINTD     a top integer d
    --topint TOPINT       a top integer
    --output_json OUTPUT_JSON
                            file path to output json file

where I think i would prefer if the output were something like this

    $ python test_schema.py --help
        usage: test_schema.py [-h] [--log_level LOG_LEVEL] [--nest.aint NEST.AINT]
                            [--nest.bstr NEST.BSTR] [--input_json INPUT_JSON]
                            [--topintc TOPINTC] [--topintb TOPINTB]
                            [--topintd TOPINTD] [--topint TOPINT]
                            [--output_json OUTPUT_JSON]

        optional arguments:                       
        --nest.aint NEST.AINT      a integer (default = 5)                        
        --nest.bstr NEST.BSTR      a string (required)       
        --topint TOPINT            a top integer (required)                   
        --topintc TOPINTC          a top integer c (required)
        --topintb TOPINTB          a top integer b (default=5)
        --topintd TOPINTD          a top integer d (default=7)
        --input_json INPUT_JSON    file path of input json file 
        --output_json OUTPUT_JSON  file path to output json file
        --log_level LOG_LEVEL      set the logging level of the module (default=WARNING)
        -h, --help                 show this help message and exit

output validation

We have setup the default arguments to include --output_json, but we haven't actually done anything with it by default.

I propose that we make an extension of ArgSchemaParser, ArgSchemaOutputParser which standardizes how this would be done, but makes it's use optional for those who want it. Maybe even the output_json should do with the schema_type, so that output_json isn't a default part of the ArgSchema unless you are using ArgSchemaOutputParser... but i'm a bit confused on that point.

class ArgSchemaOutputParser(ArgSchemaParser):

    def __init__(self,output_schema_type = None, *args, **kwargs):
        self.output_schema_type = output_schema_type
        super(ArgSchemaOutputParser,self).__init__(*args,**kwargs)
    
    def output(self,d):
        """outputs a dictionary to the output_json file path after
        validating it through the output_schema_type

        Parameters
        ----------
        d:dict
            output dictionary to output to self.mod['output_json'] location
        
        Raises:
        mm.ValidationError
        """

        schema = self.output_schema_type()
        (output_json,errors)=schema.dump(d)
        if len(errors)>0:
            raise mm.ValidationError(json.dumps(errors))
        with open(self.args['output_json'],'w') as fp:
            json.dump(output_json,fp)

Change how Lists and NumpyArrays are handled at the command line

Currently, the way that lists and arrays are handled at the command line is to set nargs to *, allowing 1 or more arguments for the variable at the command line. If we had a schema with an item mylist which was a list of ints, we would invoke it like so at the command line:

example_program --mylist 1 2 3 4 5 6

This handling is nice and pretty simple, and allows (requires) spaces between elements of the argument. The downside is that for any more complicated arrangement of lists or arrays (for example a list of lists or a multidimensional array), it is impossible to set them at the command line. For example, if I had a schema with a list of lists of ints, all of the following attempts at invoking result in validation errors because the elements are not lists:

example_program --mylist 1 2 3 4 5 6
example_program --mylist [1 2 3] [4 5 6]
example_program --mylist [[1 2 3] [4 5 6]]
example_program --mylist [[1,2,3],[4,5,6]]

I'd like to propose using ast.literal eval for handling lists and arrays in the argument parser. This would result in setting arrays and lists at the command line like the last of the above examples.

`input_json` no longer an arg in 2.0.0a1 ?

considering the code that follows...

argschema-1.17.5 outputs:
{'input_json': 'example.json', 'myfile': 'tmp2.txt', 'log_level': 'ERROR'}
argschema-2.0.0a1 outputs:
{'myfile': 'tmp2.txt', 'log_level': 'ERROR'}

is this an intentional breaking change?

import argschema
import json


def write_example_json():
    example = {
            'myfile': "tmp2.txt"
            }
    json_fname = "example.json"
    with open(json_fname, "w") as f:
        json.dump(example, f)
    return json_fname


class MySchema(argschema.ArgSchema):
    myfile = argschema.fields.OutputFile(
        required=True,
        description="")


class MyClass(argschema.ArgSchemaParser):
    default_schema = MySchema

    def run(self):
        print(self.args)


if __name__ == "__main__":
    inj = write_example_json()
    args = {
            'input_json': inj
            }
    m = MyClass(args=['--input_json', inj])
    m.run()

nested schemas broken in latest version

import argschema as ags

class ModelFit(ags.schemas.DefaultSchema):
fit_type = ags.fields.Str(description="")
hof_fit = ags.fields.InputFile(description="")
hof = ags.fields.InputFile(description="")

class PopulationSelectionPaths(ags.schemas.DefaultSchema):
fits = ags.fields.Nested(ModelFit, description="", many=True)

class PopulationSelectionParameters(ags.ArgSchema):
paths = ags.fields.Nested(PopulationSelectionPaths)

for example can't be read in my --input_json in latest version

CLI option --output_json is not an override flag

ArgSchemaParser.output(d, <path>) will write to <path> even if --output_json <different-path> is passed at the command line. This seems inconsistent with the way inputs are handled, where a command-line override will take precedence over the input dictionary that is passed to ArgSchemaParser, or even the inputs in a json file.

Test suite doesn't run in Windows

Currently the tests are configured to run with --boxed which explicitly calls os.fork() and results in pytest throwing an error because fork is not supported on Windows.

Removing the --boxed flag, test_files.test_output_file_no_write failed and test_files.test_output_dir_bad_permission actually just hung indefinitely.

python 2/3 compatibility

argschema.fields.Str() seems to return unicode instead of str in python 2.7.13

script:

import sys
from argschema import ArgSchema, ArgSchemaParser, fields

class MySchema(ArgSchema):
    string = fields.Str()

def test_string(string):
    if isinstance(string, str):
        print "PASSED!\ntype :: %s" % type(string)
    else:
        print "FAILED!\ntype :: %s" % type(string)

if __name__ == "__main__":
    print "python ::\n%s\n" % sys.version
    print "Test\n--------"
    mod = ArgSchemaParser(schema_type=MySchema)
    test_string(mod.args["string"])

output:

$ python argschema_issue.py --string test
python ::
2.7.13 |Continuum Analytics, Inc.| (default, Dec 20 2016, 23:09:15) 
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)]

Test
--------
FAILED!
type :: <type 'unicode'>

unknown validation error when using OutputDir on a cifs/samba share

maybe also on an output file

argschema/argparse doesn't complain about unexpected arguments

In particular I just got stuck for a little bit because i was sending in
python my_script.py path_to_json
rather than
python my_script.py --input_json path_to_json

and argschema was happy to continue because i had my script falling back to an example input, but i wasn't getting the right behavior. Seems to me that if the user is putting in extra arguments that argparse and/or argschema isn't handling, that it should at least throw a warning if not an error.

What do other people think?

Unable to add additional parser arguments:

I have an entry-point that I want additional arguments, beyond those used by argschema:

> foo --version

> foo --input_json blah.json

Because the parse argument is instantiated in utils, and parsed immediately when ArgSchemaParse is constructed, I have no option to inject this arg.

auto documentation

My adventures in documenting argschema led me to the thought that it should be possible to automate the writing of Sphinx style documentation for any module that uses argschema. It can describe the format of input parameters completely with human readable descriptions if provided in the Schemas. I think this can be implemented by using hooks available in sphinx-autodoc to catch all the Schema, ArgSchema and ArgSchemaParser schemas which pass through it.

Recursively defined schemas result in an infinite loop

If you pass ArgSchemaParser a schema which contains recursive elements, then the load_with_defaults functions will result in an infinite loop.

post_load error with marshmallow update 2.20.5 -> 3.0.0rc6

code that follows produces:

marhsmallow 2.20.5
testing MyClass1
{'xid': 1, 'log_level': 'ERROR'}
MyClass1 passed
testing MyClass2
{'xid': 1, 'log_level': 'ERROR'}
MyClass2 passed

and:

marhsmallow 3.0.0rc6
testing MyClass1
{'log_level': 'ERROR', 'xid': 1}
MyClass1 passed
testing MyClass2
MyClass2 failed
'NoneType' object has no attribute 'get'

import argschema
import marshmallow as mm


example1 = {
        'xid': 1,
        }


class MySchema1(argschema.ArgSchema):
    xid = argschema.fields.Int(required=True)


class MySchema2(argschema.ArgSchema):
    xid = argschema.fields.Int(required=True)
    
    @mm.post_load
    def my_post(self, data):
        pass


class MyClass1(argschema.ArgSchemaParser):
    default_schema = MySchema1
    def run(self):
        print(self.args)
    

class MyClass2(argschema.ArgSchemaParser):
    default_schema = MySchema2
    def run(self):
        print(self.args)
    

if __name__ == "__main__":
    print("marhsmallow {}".format(mm.__version__))
    for myclass in [MyClass1, MyClass2]:
        print("testing {}".format(myclass.__name__))
        try:
            mb = myclass(input_data=example1, args=[])
            mb.run()
        except Exception as e:
            print("{} failed".format(myclass.__name__))
            print(e)
        else:
            print("{} passed".format(myclass.__name__))

argschema not handling non-required Nested schemas with required fields well

import argschema
import marshmallow as mm


class MyNested(argschema.schemas.DefaultSchema):
    a = argschema.fields.Int(required=True)
    b = argschema.fields.Str(required=True)
    c = argschema.fields.Str(required=False,default='c')

class MySchema(argschema.ArgSchema):
    nested = argschema.fields.Nested(MyNested, only=['a', 'b'],
        required=False, default=mm.missing)

def test_nested_example():
    mod = argschema.ArgSchemaParser(schema_type=MySchema,
                                     args = [])
    assert(not 'nested' in mod.args.keys)

def test_nested_marshmallow_example():
    schema = MySchema()
    (result,errors)=schema.load({})
    assert(len(errors)==0)

test_nested_example does not pass, but test_nested_marshmallow_example does. If the user wants to specify a Nested schema which is optional, but if filled in has some required fields, we aren't handling that well now.

Compatibility for marshmallow >=3.0.0b7

Should be an easy fix, just wanted to get this up here in case anyone else gets stuck before a fix gets in....

Marshmallow changed the interface for the Schema.load() and Schema.dump() methods so argschema_parser's __init__ is broken. From mm documentation: "Changed in version 3.0.0b7: This method returns the deserialized data rather than a (data, errors) duple. A ValidationError is raised if invalid data are passed."

 line 178, in __init__
    if len(result.errors) > 0:
AttributeError: 'dict' object has no attribute 'errors'