Giter VIP home page Giter VIP logo

Comments (8)

horejsek avatar horejsek commented on July 18, 2024 3

I don't follow your thoughts much. Anyway, my idea is to simply add some context to the exception so you can do something reasonable with it. Something like:

try:
    validate(data)
except JsonSchemaException as exc:
    assert exc.rule == "maximum"
    assert exc.rule_definition == 24
    assert exc.path == "property.subproperty[2]" 
    # or maybe ["property", "subproperty", "2"], maybe both
    assert exc.value == 42
    assert exc.message == "property.subproperty[2] must be smaller than 24"

from python-fastjsonschema.

horejsek avatar horejsek commented on July 18, 2024 2

Good idea! Probably when I will be doing #36, I will do this one with it. :-)

from python-fastjsonschema.

horejsek avatar horejsek commented on July 18, 2024 2

Thanks for the tip. Something have to be definitely done, there is another issue #72. Unfortunately I cannot guarantee I will make it done till end of the year.

from python-fastjsonschema.

kapilt avatar kapilt commented on July 18, 2024

I've been noodling on how to do this, as i've had to do some extensive customization on other jsonschema libraries to get human readable (aka end user) translation. i think some of this is application specific, but definitely some additional information in the exception would be helpful. Unfortunately for true customization towards human readable errors (ie the app specific kind), the only way i can see in the existing setup to get there is to catch the exception and use schema knowledge in conjunction with stack walking to get to something. ie typically i use absolute path (array) from the top of the document to the item that error'd to get to the most user usable context. There's some consideration here of how the library validates that also causes concern, because it effectively does a for loop with try/except over methods, we end up losing the specific error matching error, for a more generic one from the enclosing block, i'm not sure how to resolve that.

from python-fastjsonschema.

kapilt avatar kapilt commented on July 18, 2024

thought about it a bit more, two different facilities i think are needed, one is adding a context object to the downward call tree, or some mechanism of capturing the hierarchy for leaf callers to use when raising an exception. the other is at a given stack/level evaluation capturing the current for loop/if/else chain errors in a transient var thats also captured in a fall off the end of the list/chain exception. for best guess on error (app independent) its probably a view to the deepest chained exception, and for app specific this carries enough context in the exception to avoid the stack walking/foraging in an except handler. Curious for feedback on this or other possible mechanisms.

fwiw, here's an example i do with another lib to enrich with semantics for end user, its fairly tightly coupled, but still indicative of determining best human message with additional schema knowledge across a set of leaf node errors. https://github.com/cloud-custodian/cloud-custodian/blob/master/c7n/schema.py#L93

from python-fastjsonschema.

kapilt avatar kapilt commented on July 18, 2024

i think adding in the additional path information would be useful, but i don't think it really addresses some of the use cases. at the moment fast jsonschema when it hits an anyOf, allOf, or oneOf will effectively drop leaf exceptions and generate a new one at the container schema element. i'm generally working with large schemas (1.2mb) and small documents (1-2k), the schema has a lot of elements under am anyOf block, getting a useful error message to a human requires walking/evaluating the leaf error messages (by schema tree leaf depth) against the schema to give something for a human to consume that they can correct their data.

so coming up with a simple example https://gist.github.com/kapilt/0fe1116df1cec10921d23559de5cae7d

the default error raised by fastjsonschema on trunk against this sample data is simply

data.policies[0] must be valid by one of anyOf definition

the additional attributes that were added to the exception, unfortunately aren't useful in presenting to the user why specifically there was a failure because it lacks the context of any the previously caught exceptions.

(Pdb) dir(e)
['__cause__', '__class__', '__context__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__setstate__', '__sizeof__', '__str__', '__subclasshook__', '__suppress_context__', '__traceback__', '__weakref__', 'args', 'definition', 'message', 'name', 'path', 'rule', 'rule_definition', 'value', 'with_traceback']
(Pdb) e.rule
'anyOf'
(Pdb) e.path
['data', 'policies', '0']
(Pdb) e.message
'data.policies[0] must be valid by one of anyOf definition'
(Pdb) e.value
{'name': 'team-tag-ebs-snapshot-audit', 'resource': 'ebs-snapshot', 'description': 'Cloud Custodian EBS Snapshot Team Tag Audit\n', 'comments': 'Copy EBS volume tag with key "Team" to EBS snapshots\n', 'filters': [{'or': [{'tag:Team': 'empty'}, {'tag:Team': 'absent'}]}], 'actions': [{'type': 'copy-related-tag', 'resource': 'ebs', 'skip_missing': True, 'key': 'VolumeId', 'tags': 'Team'}]}

while the default jsonschema error output is effectively similiar, the exception has enough context information from the leaf node exceptions that we're able to produce a relevant human semantic error message of


Semantic Error
'Team' is not valid under any of the given schemas

Failed validating 'oneOf' in schema[2]['properties']['tags']:
    {'oneOf': [{'enum': ['*']}, {'type': 'array'}]}

On instance['tags']:
    'Team'

from python-fastjsonschema.

horejsek avatar horejsek commented on July 18, 2024

@kapilt What about properties rule and rule_definition?

from python-fastjsonschema.

kapilt avatar kapilt commented on July 18, 2024

rule and rule_definition are helpful for leaf schema elements to capture why. for container schema elements, their not particularly helpful. ie in the example code above, looking at rule definition simply provides a list of all underlying schema container elements without providing any context of which one failed to validate. of course of all of them did, but understanding which one of the child elements got closest to matching is key to presenting something useful to the user.

(Pdb) e.rule
'anyOf'
 e.rule_definition
(Pdb) [{'$ref': '#/definitions/resources/aws.iam-group/policy'}, {'$ref': '#/definitions/resources/aws.iam-role/policy'}, {'$ref': '#/definitions/resources/aws.iam-user/policy'},
{'$ref': '#/definitions/resources/aws.iam-policy/policy'}, {'$ref': '#/definitions/resources/aws.iam-profile/policy'}, {'$ref': '#/definitions/resources/aws.iam-certificate/policy'}, {'$ref': '#/definitions/resources/aws.account/policy'}, {'$ref': '#/definitions/resources/aws.acm-certificate/policy'}, {'$ref': '#/definitions/resources/aws.ami/policy'}, {'$ref': '#/definitions/resources/aws.rest-account/policy'}, {'$ref': '#/definitions/resources/aws.rest-api/policy'}, {'$ref': '#/definitions/resources/aws.rest-stage/policy'}, {'$ref': '#/definitions/resources/aws.rest-resource/policy'}, {'$ref': '#/definitions/resources/aws.rest-vpclink/policy'}, {'$ref': '#/definitions/resources/aws.shield-protection/policy'}, {'$ref': '#/definitions/resources/aws.shield-attack/policy'}, {'$ref': '#/definitions/resources/aws.app-elb/policy'}, {'$ref': '#/definitions/resources/aws.app-elb-target-group/policy'}, {'$ref': '#/definitions/resources/aws.ec2/policy'}, {'$ref': '#/definitions/resources/aws.launch-template-version/policy'}, {'$ref': '#/definitions/resources/aws.ec2-reserved/policy'}, {'$ref': '#/definitions/resources/aws.asg/policy'}, {'$ref': '#/definitions/resources/aws.launch-config/policy'}, {'$ref': '#/definitions/resources/aws.lambda/policy'}, {'$ref': '#/definitions/resources/aws.lambda-layer/policy'}, {'$ref': '#/definitions/resources/aws.backup-plan/policy'}, {'$ref': '#/definitions/resources/aws.batch-compute/policy'}, {'$ref': '#/definitions/resources/aws.batch-definition/policy'}, {'$ref': '#/definitions/resources/aws.cfn/policy'}, {'$ref': '#/definitions/resources/aws.distribution/policy'}, {'$ref': '#/definitions/resources/aws.streaming-distribution/policy'}, {'$ref': '#/definitions/resources/aws.cloudsearch/policy'}, {'$ref': '#/definitions/resources/aws.cloudtrail/policy'}, {'$ref': '#/definitions/resources/aws.codecommit/policy'}, {'$ref': '#/definitions/resources/aws.codebuild/policy'}, {'$ref': '#/definitions/resources/aws.codepipeline/policy'}, {'$ref': '#/definitions/resources/aws.identity-pool/policy'}, {'$ref': '#/definitions/resources/aws.user-pool/policy'}, {'$ref': '#/definitions/resources/aws.config-recorder/policy'}, {'$ref': '#/definitions/resources/aws.config-rule/policy'}, {'$ref': '#/definitions/resources/aws.alarm/policy'}, {'$ref': '#/definitions/resources/aws.event-rule/policy'}, {'$ref': '#/definitions/resources/aws.event-rule-target/policy'}, {'$ref': '#/definitions/resources/aws.log-group/policy'}, {'$ref': '#/definitions/resources/aws.directory/policy'}, {'$ref': '#/definitions/resources/aws.cloud-directory/policy'}, {'$ref': '#/definitions/resources/aws.directconnect/policy'}, {'$ref': '#/definitions/resources/aws.dlm-policy/policy'}, {'$ref': '#/definitions/resources/aws.dms-instance/policy'}, {'$ref': '#/definitions/resources/aws.dms-endpoint/policy'}, {'$ref': '#/definitions/resources/aws.dynamodb-table/policy'}, {'$ref': '#/definitions/resources/aws.dynamodb-backup/policy'}, {'$ref': '#/definitions/resources/aws.dynamodb-stream/policy'}, {'$ref': '#/definitions/resources/aws.dax/policy'}, {'$ref': '#/definitions/resources/aws.datapipeline/policy'}, {'$ref': '#/definitions/resources/aws.kms/policy'}, {'$ref': '#/definitions/resources/aws.kms-key/policy'}, {'$ref': '#/definitions/resources/aws.ebs-snapshot/policy'}, {'$ref': '#/definitions/resources/aws.ebs/policy'}, {'$ref': '#/definitions/resources/aws.ecr/policy'}, {'$ref': '#/definitions/resources/aws.ecs/policy'}, {'$ref': '#/definitions/resources/aws.ecs-service/policy'}, {'$ref': '#/definitions/resources/aws.ecs-task/policy'}, {'$ref': '#/definitions/resources/aws.ecs-task-definition/policy'}, {'$ref': '#/definitions/resources/aws.ecs-container-instance/policy'}, {'$ref': '#/definitions/resources/aws.efs/policy'}, {'$ref': '#/definitions/resources/aws.efs-mount-target/policy'}, {'$ref': '#/definitions/resources/aws.cache-cluster/policy'}, {'$ref': '#/definitions/resources/aws.cache-subnet-group/policy'}, {'$ref': '#/definitions/resources/aws.cache-snapshot/policy'}, {'$ref': '#/definitions/resources/aws.elasticbeanstalk/policy'}, {'$ref': '#/definitions/resources/aws.elasticbeanstalk-environment/policy'}, {'$ref': '#/definitions/resources/aws.elasticsearch/policy'}, {'$ref': '#/definitions/resources/aws.elb/policy'}, {'$ref': '#/definitions/resources/aws.eks/policy'}, {'$ref': '#/definitions/resources/aws.emr/policy'}, {'$ref': '#/definitions/resources/aws.gamelift-build/policy'}, {'$ref': '#/definitions/resources/aws.gamelift-fleet/policy'}, {'$ref': '#/definitions/resources/aws.glacier/policy'}, {'$ref': '#/definitions/resources/aws.glue-connection/policy'}, {'$ref': '#/definitions/resources/aws.glue-dev-endpoint/policy'}, {'$ref': '#/definitions/resources/aws.glue-job/policy'}, {'$ref': '#/definitions/resources/aws.glue-crawler/policy'}, {'$ref': '#/definitions/resources/aws.glue-database/policy'}, {'$ref': '#/definitions/resources/aws.glue-table/policy'}, {'$ref': '#/definitions/resources/aws.health-event/policy'}, {'$ref': '#/definitions/resources/aws.cloudhsm-cluster/policy'}, {'$ref': '#/definitions/resources/aws.hsm/policy'}, {'$ref': '#/definitions/resources/aws.hsm-hapg/policy'}, {'$ref': '#/definitions/resources/aws.hsm-client/policy'}, {'$ref': '#/definitions/resources/aws.iot/policy'}, {'$ref': '#/definitions/resources/aws.kafka/policy'}, {'$ref': '#/definitions/resources/aws.kinesis/policy'}, {'$ref': '#/definitions/resources/aws.firehose/policy'}, {'$ref': '#/definitions/resources/aws.kinesis-analytics/policy'}, {'$ref': '#/definitions/resources/aws.lightsail-instance/policy'}, {'$ref': '#/definitions/resources/aws.lightsail-db/policy'}, {'$ref': '#/definitions/resources/aws.lightsail-elb/policy'}, {'$ref': '#/definitions/resources/aws.ml-model/policy'}, {'$ref': '#/definitions/resources/aws.message-broker/policy'}, {'$ref': '#/definitions/resources/aws.opswork-stack/policy'}, {'$ref': '#/definitions/resources/aws.opswork-cm/policy'}, {'$ref': '#/definitions/resources/aws.rds/policy'}, {'$ref': '#/definitions/resources/aws.rds-subscription/policy'}, {'$ref': '#/definitions/resources/aws.rds-snapshot/policy'}, {'$ref': '#/definitions/resources/aws.rds-subnet-group/policy'}, {'$ref': '#/definitions/resources/aws.rds-reserved/policy'}, {'$ref': '#/definitions/resources/aws.rds-param-group/policy'}, {'$ref': '#/definitions/resources/aws.rds-cluster-param-group/policy'}, {'$ref':
'#/definitions/resources/aws.rds-cluster/policy'}, {'$ref': '#/definitions/resources/aws.rds-cluster-snapshot/policy'}, {'$ref': '#/definitions/resources/aws.redshift/policy'}, {'$ref': '#/definitions/resources/aws.redshift-subnet-group/policy'}, {'$ref': '#/definitions/resources/aws.redshift-snapshot/policy'}, {'$ref': '#/definitions/resources/aws.hostedzone/policy'}, {'$ref': '#/definitions/resources/aws.healthcheck/policy'}, {'$ref': '#/definitions/resources/aws.rrset/policy'}, {'$ref': '#/definitions/resources/aws.r53domain/policy'}, {'$ref': '#/definitions/resources/aws.s3/policy'}, {'$ref': '#/definitions/resources/aws.sagemaker-notebook/policy'}, {'$ref': '#/definitions/resources/aws.sagemaker-job/policy'}, {'$ref': '#/definitions/resources/aws.sagemaker-transform-job/policy'}, {'$ref': '#/definitions/resources/aws.sagemaker-endpoint/policy'}, {'$ref': '#/definitions/resources/aws.sagemaker-endpoint-config/policy'}, {'$ref': '#/definitions/resources/aws.sagemaker-model/policy'}, {'$ref': '#/definitions/resources/aws.secrets-manager/policy'}, {'$ref': '#/definitions/resources/aws.step-machine/policy'}, {'$ref': '#/definitions/resources/aws.simpledb/policy'}, {'$ref': '#/definitions/resources/aws.snowball-cluster/policy'}, {'$ref': '#/definitions/resources/aws.snowball/policy'}, {'$ref': '#/definitions/resources/aws.sns/policy'}, {'$ref': '#/definitions/resources/aws.storage-gateway/policy'}, {'$ref': '#/definitions/resources/aws.sqs/policy'}, {'$ref': '#/definitions/resources/aws.ssm-parameter/policy'}, {'$ref': '#/definitions/resources/aws.ssm-managed-instance/policy'}, {'$ref': '#/definitions/resources/aws.ssm-activation/policy'}, {'$ref': '#/definitions/resources/aws.ops-item/policy'}, {'$ref': '#/definitions/resources/aws.support-case/policy'}, {'$ref': '#/definitions/resources/aws.vpc/policy'}, {'$ref': '#/definitions/resources/aws.subnet/policy'}, {'$ref': '#/definitions/resources/aws.security-group/policy'}, {'$ref': '#/definitions/resources/aws.eni/policy'}, {'$ref': '#/definitions/resources/aws.route-table/policy'}, {'$ref':
'#/definitions/resources/aws.transit-gateway/policy'}, {'$ref': '#/definitions/resources/aws.transit-attachment/policy'}, {'$ref': '#/definitions/resources/aws.peering-connection/policy'}, {'$ref': '#/definitions/resources/aws.network-acl/policy'}, {'$ref': '#/definitions/resources/aws.network-addr/policy'}, {'$ref': '#/definitions/resources/aws.customer-gateway/policy'}, {'$ref': '#/definitions/resources/aws.internet-gateway/policy'}, {'$ref': '#/definitions/resources/aws.nat-gateway/policy'}, {'$ref': '#/definitions/resources/aws.vpn-connection/policy'}, {'$ref': '#/definitions/resources/aws.vpn-gateway/policy'}, {'$ref': '#/definitions/resources/aws.vpc-endpoint/policy'}, {'$ref': '#/definitions/resources/aws.key-pair/policy'}, {'$ref': '#/definitions/resources/aws.waf/policy'}, {'$ref': '#/definitions/resources/aws.waf-regional/policy'}, {'$ref': '#/definitions/resources/aws.fsx/policy'}, {'$ref': '#/definitions/resources/aws.fsx-backup/policy'}, {'$ref': '#/definitions/resources/aws.workspaces/policy'}]  

rule_definition is actually slightly problematic for other reasons on large container elements as the schema gets duplicated several times via the exception blocks. for our schema (1.2mb) the generated code size when from 4mb to 8mb with this change. I think it would be better to store the schema once in the generated code and use an expression (jmespath/jsonpath) pointer to the data structure, similar to what we do with the data being validated (ie e.path), else we have significant data duplication in the generated code.

I've explored modifying the generator on container schema elements (oneOf, anyOf, allOf) to collect leaf exceptions and propagate them in container element exceptions, which allows for a depth search to the most meaningful exception node. The project I'm working on is open source (GitHub.com/cloud-custodian/cloud-custodian) and this is prelim exploration as we're waiting till we can drop 2.7 support at end of year, till we can switch to fastjsonschema, but coming up with a way to provide good error messages to users is a key consideration.

$ git diff
diff --git a/fastjsonschema/draft04.py b/fastjsonschema/draft04.py
index 824a870..f7dc50b 100644
--- a/fastjsonschema/draft04.py
+++ b/fastjsonschema/draft04.py
@@ -143,16 +143,17 @@ class CodeGeneratorDraft04(CodeGenerator):
         Valid values for this definition are 3, 4, 5, 10, 11, ... but not 8 for example.                                                                                   
         """
         self.l('{variable}_any_of_count = 0')
+        self.l('{variable}_leaf_exceptions = []')
         for definition_item in self._definition['anyOf']:
             # When we know it's passing (at least once), we do not need to do another expensive try-except.                                                                
             with self.l('if not {variable}_any_of_count:', optimize=False):
                 with self.l('try:'):
                     self.generate_func_code_block(definition_item, self._variable, self._variable_name, clear_variables=True)                                              
                     self.l('{variable}_any_of_count += 1')
-                self.l('except JsonSchemaException: pass')
+                self.l('except JsonSchemaException as e: {variable}_leaf_exceptions.append(e)')                                                                            

         with self.l('if not {variable}_any_of_count:', optimize=False):
-            self.exc('{name} must be valid by one of anyOf definition', rule='anyOf')                                                                                      
+            self.exc('{name} must be valid by one of anyOf definition', rule='anyOf', exceptions='%s_leaf_exceptions' % self._variable)                                    

     def generate_one_of(self):
         """
diff --git a/fastjsonschema/exceptions.py b/fastjsonschema/exceptions.py
index b555356..200a4cc 100644
--- a/fastjsonschema/exceptions.py
+++ b/fastjsonschema/exceptions.py
@@ -20,13 +20,14 @@ class JsonSchemaException(ValueError):
         Added all extra properties.
     """

-    def __init__(self, message, value=None, name=None, definition=None, rule=None):
+    def __init__(self, message, value=None, name=None, definition=None, rule=None, exceptions=()):                                                                         
         super().__init__(message)
         self.message = message
         self.value = value
         self.name = name
         self.definition = definition
         self.rule = rule
+        self.exceptions = exceptions

     @property
     def path(self):
diff --git a/fastjsonschema/generator.py b/fastjsonschema/generator.py
index c5916c5..f47d526 100644
--- a/fastjsonschema/generator.py
+++ b/fastjsonschema/generator.py
@@ -239,11 +239,11 @@ class CodeGenerator:
         """
         return str(string).replace('"', '\\"')

-    def exc(self, msg, *args, rule=None):
+    def exc(self, msg, *args, rule=None, exceptions=None):
         """
         """
-        msg = 'raise JsonSchemaException("'+msg+'", value={variable}, name="{name}", definition={definition}, rule={rule})'                                                
-        self.l(msg, *args, definition=repr(self._definition), rule=repr(rule))
+        msg = 'raise JsonSchemaException("'+msg+'", value={variable}, name="{name}", definition={definition}, rule={rule}, exceptions={exceptions})'                       
+        self.l(msg, *args, definition=repr(self._definition), rule=repr(rule), exceptions=exceptions)                                                                      

     def create_variable_with_length(self):
         """

from python-fastjsonschema.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.