horejsek / python-fastjsonschema Goto Github PK
View Code? Open in Web Editor NEWFast JSON schema validator for Python.
Home Page: https://horejsek.github.io/python-fastjsonschema/
License: BSD 3-Clause "New" or "Revised" License
Fast JSON schema validator for Python.
Home Page: https://horejsek.github.io/python-fastjsonschema/
License: BSD 3-Clause "New" or "Revised" License
The internal code gen creates variables based on the property name but in a few cases this can conflict with the internal booking variable names.
here's a snippet from poking around in pdb on a generated file
89589 data_keys = set(data.keys())
89590 if "type" in data_keys:
89591 data_keys.remove("type")
89592 data_type = data["type"]
89593 if data_type not in ['progagated-tags', 'propagated-tags']:
89594 raise JsonSchemaException("data.type must be one of ['progagated-tags', 'propagated-tags']")
89595 if "keys" in data_keys:
89596 data_keys.remove("keys")
89597 data_keys = data["keys"]
89598 if not isinstance(data_keys, (list)):
89599 raise JsonSchemaException("data.keys must be array")
the data in this case was
{'keys': ['ABC', 'BCD'],
'match': False,
'propagate': True,
'type': 'propagated-tags'}
and the relevant schema snippet
properties:
keys:
items:
type: string
type: array
match:
type: boolean
propagate:
type: boolean
type:
enum:
- progagated-tags
- propagated-tags
required:
- type
Hi,
while trying to look into a different issue I went ahead and generated validation code as per documentation.
I am getting name 'REGEX_PATTERNS' is not defined
errors as the definition is missing from the generated code. I went ahead and came up with a small example to reproduce:
example.schema.json
{
"$schema": "http://json-schema.org/draft-07/schema#",
"$id": "https://example.com/example.schema.json",
"title": "Example",
"description": "An example schema",
"type": "object",
"properties": {
"host": {
"type": "string",
"description": "A hostname",
"format": "hostname"
}
},
"required": [ "host" ]
}
code_gen.py
import json
import fastjsonschema
with open("example.schema.json") as schema_file:
schema = json.load(schema_file)
code = fastjsonschema.compile_to_code(schema)
with open("example_validation_code.py", "w") as f:
f.write(code)
Running code_gen.py results in:
example_validation_code.py
VERSION = "2.0"
from fastjsonschema import JsonSchemaException
NoneType = type(None)
def validate_https___example_com_example_schema_json(data):
if not isinstance(data, (dict)):
raise JsonSchemaException("data must be object")
data_is_dict = isinstance(data, dict)
if data_is_dict:
data_len = len(data)
if not all(prop in data for prop in ['host']):
raise JsonSchemaException("data must contain ['host'] properties")
if data_is_dict:
data_keys = set(data.keys())
if "host" in data_keys:
data_keys.remove("host")
data_host = data["host"]
if not isinstance(data_host, (str)):
raise JsonSchemaException("data.host must be string")
if isinstance(data_host, str):
if not REGEX_PATTERNS["hostname_re_pattern"].match(data_host):
raise JsonSchemaException("data.host must be hostname")
return data
As you can see REGEX_PATTERNS
is referenced but never defined thus leading to "undefined" errors.
I'm using python3.6 and version 2.0 of the library.
Also: When removing "format": "hostname"
from the schema, an empty REGEX_PATTERNS dict appears in the generated code.
Given a schema like
properties:
period:
exclusiveMaximum: true
exclusiveMinimum: true
maximum: 1209600
minimum: 60
type: integer
type:
enum:
- set-retention-period
required:
- type
the generated code includes this clause for period
if "period" in data_keys:
data_keys.remove("period")
data_period = data["period"]
if not isinstance(data_period, (int)) and not (isinstance(data_period, float) and data_period.is_integer()) or isinstance(data_period, bool):
raise JsonSchemaException("data.period must be integer")
if isinstance(data_period, (int, float)):
if data_period <= 60:
raise JsonSchemaException("data.period must be bigger than 60")
if isinstance(data_period, (int, float)):
if data_period >= 1209600:
raise JsonSchemaException("data.period must be smaller than 1209600")
if isinstance(data_period, (int, float)):
if data_period <= True:
raise JsonSchemaException("data.period must be bigger than True")
if isinstance(data_period, (int, float)):
if data_period >= True:
raise JsonSchemaException("data.period must be smaller than True")
the bottom two comparison checks shouldn't be there, ie there's no reason to be comparing this to the integer value of True
.
I'm happy to do this myself if it's something you would be interested in having, but it would mean having a format ip
(or maybe ip_address
) which is just either ipv4
or ipv6
. Then we could write:
"remote_addr": {"type": "string", "format": "ip"}
rather than
"remote_addr": {"type": "string", "oneOf": ["format": "ipv4", "format": "ipv6"]}
The code snippet below fails with fastjsonschema.exceptions.JsonSchemaDefinitionException: definition must be an object
while generating validation for additionalProperties
.
import fastjsonschema
v = fastjsonschema.compile({
'$schema': 'http://json-schema.org/draft-04/schema#',
'required': ['id'],
'type': 'object',
'additionalProperties': True,
'properties': {
'id': {'type': 'string'}
}
})
As I understand the draft specification a boolean is allowed.
Should we use https://github.com/pypa/setuptools_scm to automate versioning from git tags?
With that we can just remove version
from setup.py and replace it with:
setup_requires=["setuptools_scm"],
use_scm_version=True,
Here is experimental version that can support different versions of JSON Schema
https://github.com/Jokipii/python-fastjsonschema
All feedback is welcome.
============= 452 passed, 28 xfailed, 14 xpassed in 6.00 seconds ==============
The following schema is valid, and works in a compliant draft 07 implementation
{
"$schema": "http://json-schema.org/draft-07/schema#",
"$id": "x",
"definitions": {
"(+[{*#$%^&~`';:,.<> \"?!\\n}])": {
"definitions": {
"array": { "type": "array" },
"object": { "type": "object" }
},
"anyOf": [
{ "$ref": "#/definitions/(+[{*#$%^&~`';:,.<> \"?!\\n}])/definitions/array" },
{ "$ref": "#/definitions/(+[{*#$%^&~`';:,.<> \"?!\\n}])/definitions/object" }
]
}
},
"properties": {
"x": { "$ref": "#/definitions/(+[{*#$%^&~`';:,.<> \"?!\\n}])" }
}
}
import fastjsonschema, json
x = fastjsonschema.compile(json.load(open("./nested.schema.json", "r")))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/lib/python3.7/site-packages/fastjsonschema/__init__.py", line 103, in validate
return compile(definition, handlers, formats)(data)
File "/lib/python3.7/site-packages/fastjsonschema/__init__.py", line 167, in compile
exec(code_generator.func_code, global_state)
File "<string>", line 10
validate_x__definitions_(+[{*_$_^&~`';_,_<> ?!\n}])(data__x)
Some characters are properly removed, including the \"
and #
, but many are not allowed in a Python name.
Simply replacing them with _
is not such a good solution because then a property named #
is the same as %
and _
etc.
Using things like _open_paren
to escape (
etc, is somewhat ok, until a user makes a property named _open_paren
, and then the names (
and _open_paren
are wrongly considered the same.
A solution I have seen before (in Brat code gen) is to give all special characters unique names, and then to make user-inputted names like identifiers surrounded by __
2 underscores, which are translated into _dunder_
(for double-underscore), and then ensure that user input is expanded in a way that can never overlap with a different input.
I cannot find where in the JSON Schema specification it says that any character other than /
is allowed as an identifier / key name, but I am sure they are. NUL bytes are also allowed except I don't know of any Python JSON parsers that handle them properly, so I don't really care.
Also, double-escaping the characters doesn't help (one backslash alone is invalid to the JSON parser):
{
"$schema": "http://json-schema.org/draft-07/schema#",
"$id": "x",
"definitions": {
"\\(p\\)": {}
},
"properties": {
"x": { "$ref": "#/definitions/\\(p\\)" }
}
}
x = fastjsonschema.compile(json.load(open("./nested.schema.json", "r")))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/cat/Sync/projects/git/pyrat/.venv/lib/python3.7/site-packages/fastjsonschema/__init__.py", line 167, in compile
exec(code_generator.func_code, global_state)
File "<string>", line 10
validate_x__definitions_\(p\)(data__x)
^
SyntaxError: unexpected character after line continuation character
If not implemented, it should throw NotImplementedError
.
Is there any functionality support by python-fastjsonschema where we can pass one argument is jsonschema and second is json and validate it and return True or False or exception if it occurs?
When changing properties of existing items, the document may contain only properties that are being changed. In that case, one would want to ignore "required" section of the schema. Is there an existing mechanism for that, like in some other validation libraries?
When an object using patternProperties is further down the tree than another patternProperties field you get a KeyError.
Example schema:-
{
"$schema": "http://json-schema.org/draft-07/schema#",
"type": "object",
"patternProperties": {
"^foo:": {
"type": "object",
"properties": {
"baz": {
"type": "object",
"patternProperties": {
"^b": {
"type": "object"
}
}
},
"bit": {
"type": "object"
}
},
"required": [
"baz",
"bit"
]
}
}
}
Example document to validate against it :-
{
"foo:bar": {
"baz": {
"bat": {
}
},
"bit": {
}
}
}
This results in :-
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<string>", line 38, in validate
KeyError: 'bit'
The following schema where the second patternProperties is replaced with a regular properties validates correctly:-
{
"$schema": "http://json-schema.org/draft-07/schema#",
"type": "object",
"patternProperties": {
"^foo:": {
"type": "object",
"properties": {
"baz": {
"type": "object",
"properties": {
"bat": {
"type": "object"
}
}
},
"bit": {
"type": "object"
}
},
"required": [
"baz",
"bit"
]
}
}
}
The following where the first patterProperties is replaced also validates correctly:-
{
"$schema": "http://json-schema.org/draft-07/schema#",
"type": "object",
"properties": {
"foo:bar": {
"type": "object",
"properties": {
"baz": {
"type": "object",
"patternProperties": {
"^b": {
"type": "object"
}
}
},
"bit": {
"type": "object"
}
},
"required": [
"baz",
"bit"
]
}
}
}
fastjsonschema.compile({"pattern": "\\x20"})("https://www.some.com")
>>>---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-36-0d8a37029eb1> in <module>
----> 1 fastjsonschema.compile({"pattern": "\\x20"})("https://www.some.com")
<string> in validate(data)
KeyError: ' '
A bit more "^https?://(www\\.)?[a-z0-9.-]*\\.[a-z]{2,}([^<>%\\x20\\x00-\\x1f\\x7F]|%[0-9a-fA-F]{2})*$"
Python's re handle it:
In [1]: import re
In [2]: re.compile("\\x20", flags=0)
Out[2]: re.compile(r'\x20', re.UNICODE)
In [3]: p = re.compile("\\x20", flags=0)
In [4]: p.match("s")
In [5]: p.match(" ")
Out[5]: <re.Match object; span=(0, 1), match=' '>
Hi there, thanks for this nice library.
The code I'm using
import json
from fastjsonschema import compile
with open("schema.json") as f:
validator = compile(json.load(f))
with open("instance.json") as f:
instance = json.load(f)
validator(instance)
raises this exception
Traceback (most recent call last):
File "<input>", line 1, in <module>
File "<string>", line 37, in validate
File "<string>", line 37, in validate
File "<string>", line 46, in validate
UnboundLocalError: local variable 'data_val_is_dict' referenced before assignment
schema.json
:
{
"type": "object",
"additionalProperties": false,
"required": [
"question"
],
"patternProperties": {
"question": {},
"predef\\d": {
"type": "object",
"additionalProperties": false,
"required": [
"next",
"text"
],
"properties": {
"text": {
"type": "string"
},
"next": {
"$ref": "#"
}
}
},
"media\\d": {
"type": "object",
"additionalProperties": false,
"required": [
"content_type",
"content_path"
],
"properties": {
"content_type": {
"type": "string"
},
"content_path": {
"type": "string"
},
"next": {
"$ref": "#"
}
}
}
}
}
instance.json
:
{
"question": "sample text",
"predef1": {
"text": "sample text",
"next": {
"question": "sample text",
"predef1": {
"text": "sample text",
"next": {
"question": "sample text"
}
},
"predef2": {
"text": "sample text",
"next": {
"question": "sample text",
"media1": {
"content_type": "type",
"content_path": "url",
"next": {
"question": "sample text",
"predef1": {
"text": "sample text",
"next": {
"question": "sample text"
}
},
"predef2": {
"text": "sample text",
"next": {
"question": "sample text",
"media1": {
"content_type": "type",
"content_path": "url"
}
}
}
}
}
}
}
}
}
}
I used several online schema validators so I'm pretty sure that both the schema and the instance are correct. I believe it has something to do with "$ref": "#"
but I didn't have time to find the cause of this exception.
Here is experimental version that can support different versions of JSON Schema
https://github.com/Jokipii/python-fastjsonschema
All feedback is welcome.
============= 467 passed, 47 xfailed, 40 xpassed in 5.66 seconds ==============
Shared with draft-06:
Draft-07 specifics:
Is there any support for lazy validation (iterating through all errors that exist in document), or does the processing always stop at the first encountered error?
Thanks for the quick fix of #21 !
I just tried it, and indeed, the regex pattern is now there. But the code is still broken due to a different issue.
When I generate the code for this schema:
{
"$schema": "http://json-schema.org/draft-07/schema#",
"$id": "https://example.com/example.schema.json",
"title": "Example",
"description": "An example schema",
"type": "object",
"properties": {
"ip": {
"type": "string",
"description": "The IP of the future",
"format": "ipv6"
}
},
"required": [ "ip" ]
}
The resulting code contains the following dict:
REGEX_PATTERNS = {
"ipv6_re_pattern": re.compile('^(?:(?:[0-9A-Fa-f]{1,4}:){6}(?:[0-9A-Fa-f]{1,4}:[0-9A-Fa-f]{1,4}|(?:(?:[0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])\\\\.){3}(?:[0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5]))|::(?:[0-9A-Fa-f]{1,4)
}
It looks like the line is just cut off at that point, so the code won't work.
SyntaxError: EOL while scanning string literal
As seen in the JSON Schema Test Suite, only RFC 3339 compliant strings should be accepted for date-time. This means a time zone offset is required. As ISO 8601 specifies local time is applied when there is no time zone offset, there is ambiguity when client and servers are distributed across time zones.
Can we apply a new test in test_format.py
?
('2018-02-05T14:17:10.00', exc),
And change date-time regexp in fastjsonschema/draft04.py
to:
'date-time': r'^\d{4}-[01]\d-[0-3]\dT[0-2]\d:[0-5]\d:[0-5]\d(?:\.\d+)?(?:[+-][0-2]\d:[0-5]\d|Z)+\Z',
To highlight the change in the regexp, the trailing ?
is changed to a +
.
for me the most important feature is to keep track of the context/scope of where the validation process is at when the exception is thrown. so given: a schema:
article.json
"$schema": "http://json-schema.org/draft-07/schema#",
"type" : "object",
"additionalProperties": false,
"properties": {
"teaser" : { "$ref": "intro.json" },
"prolog" : { "$ref": "intro.json" }
}
}
when the error occurs during the prolog validation, the exception contains an attribute with value article.prolog.....
about where it happened (like the jsonschema library does).
Does the library supports local references? Like if I have common.json
and then schema1.json
and schema2.json
, where I would like to use something defined in common.json
. All files in the same directory. I noticed that it tried to download the referenced schema but I would prefer to use the local file - just giving relative path as "$ref": "common.json#/definitions/xyz"
.
Since commit b21744f, I always have 30 failed tests, regarding all drafts.
# Do not pass local state so it can recursively call itself.
> exec(code_generator.func_code, global_state)
E File "<string>", line 16
E data_len = len(data)
E ^
E IndentationError: unexpected indent
fastjsonschema/__init__.py:167: IndentationError
I don't understand why now, it's always about allOf.json
test :
tests/json_schema/test_draft04.py::test[additionalProperties.json / additionalProperties should not look in applicators / properties defined in allOf are not allowed] FAILED [ 18%]
tests/json_schema/test_draft04.py::test[allOf.json / allOf / allOf] FAILED [ 18%]
tests/json_schema/test_draft04.py::test[allOf.json / allOf / mismatch second] FAILED [ 18%]
tests/json_schema/test_draft04.py::test[allOf.json / allOf / mismatch first] FAILED [ 18%]
tests/json_schema/test_draft04.py::test[allOf.json / allOf / wrong type] FAILED [ 18%]
tests/json_schema/test_draft04.py::test[allOf.json / allOf with base schema / valid] FAILED [ 18%]
tests/json_schema/test_draft04.py::test[allOf.json / allOf with base schema / mismatch base schema] FAILED [ 18%]
tests/json_schema/test_draft04.py::test[allOf.json / allOf with base schema / mismatch first allOf] FAILED [ 19%]
tests/json_schema/test_draft04.py::test[allOf.json / allOf with base schema / mismatch second allOf] FAILED [ 19%]
tests/json_schema/test_draft04.py::test[allOf.json / allOf with base schema / mismatch both] FAILED [ 19%]
When I ran unittest case test_nonExistProject with fastjsonschema in master branch of this repo
The output is
AssertionError: False is not true : data.Code must match pattern MissingParameter
response: {'RequestId': '', 'HostId': '', 'Code': 'InvalidProject.NotFound', 'Message': 'Specified project is not found.'}
schema: {'id': 'http://localhost:8000/', '$ref': 'http://localhost:8000/definitions.json#/NotFoundDefinition'}
AssertionError suggests that I used the MissingParameter related jsonschema,
but the debug info I add in is_data_validate function suggests that current error was from Invalid.NotFound related jsonschema.
Why they are not the same?
def is_data_validate(response, name):
schema = SCHEMAS[name]
validator = fastjsonschema.compile(schema)
try:
validator(response)
except fastjsonschema.JsonSchemaException as e:
msg = e.message + "\n\nresponse: %s\n\nschema: %s" % (response, schema)
return False, msg
else:
return True, "validate %s pass" % name
class RestTestCase(unittest.TestCase):
def do_request_and_check_response(self, schema_name):
self.response = myclient.do_action_with_exception(self.request)
result, msg = is_data_validate(self.response, schema_name)
self.assertTrue(result, msg)
class TestCreateTagSet(client.RestTestCase):
#here is the test case that ran with unexpected result.
def test_nonExistProject(self):
schema_project_list = [('MissingParameter', ""), ('Invalid.NotFound', "non-exists")]
for schema_name, project in schema_project_list:
self.request.set_Project(project)
self.do_request_and_check_response(schema_name)
here is the json file for reference,
[
{
"name": "MissingParameter",
"schema": {
"id": "http://localhost:8000/",
"type": "object",
"allOf": [
{"$ref": "definitions.json#/HostRequestDefinition"},
{
"properties": {
"Code": {"type": "string", "pattern": "MissingParameter"},
"Message": {"type": "string", "pattern": "The parameter is missing."}
},
"required": ["Code", "Message"]
}
]
}
},
{
"name": "Invalid.NotFound",
"schema": {
"id": "http://localhost:8000/",
"$ref": "definitions.json#/NotFoundDefinition"
}
}
]
file: definitions.json
{
"NotFoundDefinition": {
"type": "object",
"allOf": [
{"$ref": "#/HostRequestDefinition"},
{
"properties": {
"Code": {"type": "string", "pattern": "Invalid(Set|Project).NotFound"},
"Message": {"type": "string", "pattern": "Specified (set|project) is not found."}
},
"required:": ["Code", "Message"]
}
]
}
}
For schema:
schema = {'type': 'number', 'format': 'float64'}
fastjsonschema.compile(schema)
raises:
File "/.pyenv/versions/3.7.0/lib/python3.7/site-packages/fastjsonschema/draft04.py", line 268, in generate_format raise JsonSchemaDefinitionException('Unknown format: {}'.format(format_)) fastjsonschema.exceptions.JsonSchemaDefinitionException: Unknown format: float64
Hello,
Having this error on the latest version (fastjsonschema==2.13
). Seems to be related to #49
Traceback (most recent call last):
File "<input>", line 1, in <module>
File "fastjsonschema/__init__.py", line 103, in validate
return compile(definition)(data)
File "<string>", line 48, in validate
File "<string>", line 104, in validate___scenario
File "<string>", line 104, in validate___scenario
File "<string>", line 104, in validate___scenario
[Previous line repeated 1 more time]
File "<string>", line 150, in validate___scenario
File "<string>", line 104, in validate___scenario
File "<string>", line 150, in validate___scenario
File "<string>", line 104, in validate___scenario
File "<string>", line 104, in validate___scenario
File "<string>", line 104, in validate___scenario
[Previous line repeated 4 more times]
File "<string>", line 150, in validate___scenario
File "<string>", line 104, in validate___scenario
File "<string>", line 150, in validate___scenario
File "<string>", line 104, in validate___scenario
File "<string>", line 104, in validate___scenario
File "<string>", line 137, in validate___scenario
UnboundLocalError: local variable 'data_val__contentpath_len' referenced before assignment
Sorry for such a big example. That seems to be the smallest example to reproduce this bug.
I have a schema which requires a datatype "string". I'd like to change this to require a datatype "safestring" (a non-folded, memory clearing version of string) which cannot be derived from str.
Is there a fairly trivial way to either:
a) change the type to reference a sutom class or
b) change fjs conception of what is allowable for strings
For now I just monkey patch in my own CodeGeneratorDraft07 class and modify the JSON_TYPE_TO_PYTHON_TYPE instance.
If not, I'd be happy to add it
When attempting to compile some schemas, the following stack trace is encountered:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python3.5/dist-packages/fastjsonschema/__init__.py", line 80, in compile
resolver = RefResolver.from_schema(definition, handlers=handlers)
File "/usr/local/lib/python3.5/dist-packages/fastjsonschema/ref_resolver.py", line 109, in from_schema
return cls(schema.get('id', ''), schema, handlers=handlers, **kwargs)
File "/usr/local/lib/python3.5/dist-packages/fastjsonschema/ref_resolver.py", line 98, in __init__
self.walk(schema)
File "/usr/local/lib/python3.5/dist-packages/fastjsonschema/ref_resolver.py", line 180, in walk
self.walk(item)
File "/usr/local/lib/python3.5/dist-packages/fastjsonschema/ref_resolver.py", line 173, in walk
self.store[normalize(self.resolution_scope)] = node
File "/usr/local/lib/python3.5/dist-packages/fastjsonschema/ref_resolver.py", line 45, in normalize
return urlparse.urlsplit(uri).geturl()
File "/usr/lib/python3.5/urllib/parse.py", line 328, in urlsplit
url, scheme, _coerce_result = _coerce_args(url, scheme)
File "/usr/lib/python3.5/urllib/parse.py", line 115, in _coerce_args
return _decode_args(args) + (_encode_result,)
File "/usr/lib/python3.5/urllib/parse.py", line 99, in _decode_args
return tuple(x.decode(encoding, errors) if x else '' for x in args)
File "/usr/lib/python3.5/urllib/parse.py", line 99, in <genexpr>
return tuple(x.decode(encoding, errors) if x else '' for x in args)
AttributeError: 'dict' object has no attribute 'decode'
...I am still trying to get more information about the schemas that cause this.
Example code to trigger problem:
import fastjsonschema
schema = {
"type": "object",
"properties": {
"id": {"type": "integer"}
}
}
fastjsonschema.compile(schema)
This appears to be introduced in 1.5; not encountered in 1.4.
Hello,
I found this weird bug in compile code generation; to reproduce just create a file named Section.json with the following content (create another dummy json for OtherSection too).
{
"title": "Section",
"type": "object",
"properties": {
"SectionList": {
"oneOf": [
{
"type": "array",
"items": {
"$ref": "file:///Section.json"
}
},
{
"type": "array",
"items": {
"$ref": "file:///OtherSection.json"
}
}
]
}
}
}
The combination of the 'oneOf' and 'array' seems to drive crazy the code generator which raises the exception IndentationError: expected an indented block.
Thanks for this great package!
The following schema will generate invalid code:
{
"enum": ["Bernardo O'Higgins"]
}
This is the result:
if data not in ["Bernardo O'Higgins"]:
raise JsonSchemaException("data must be one of ["Bernardo O'Higgins"]")
Note that quotes are not properly escaped, resulting in invalid Python code.
I like the fact that this library is very fast compared to json schema, but in term of memory usage this lib isn't any better.
The problem Is that requests
takes about 15 Mb of memory after being loaded and the result is that the module is 20 Mb big.
Maybe I will find time to replace requests
with urllib
, are you ok with a pull request like that?
memory_profiler result:
Line # Mem usage Increment Line Contents
================================================
6 24.7 MiB 24.7 MiB @profile
7 def main():
8
9 40.5 MiB 15.8 MiB import fastjsonschema
The lib would takes 0.1 Mb of memory without requests and about 6 Mb using urllib
Line # Mem usage Increment Line Contents
================================================
6 24.7 MiB 24.7 MiB @profile
7 def main():
8
9 40.4 MiB 15.7 MiB import requests
10 40.5 MiB 0.1 MiB import fastjsonschema
"It compiles definition into Python most stupid code which people would have hard time to write by themselves because of not-written-rule DRY (donโt repeat yourself)".
What does that mean? What "most stupid code"?? "It" in "...would have hard time to write", does that refer to python-fastjsonschema?
Also, the opening of the documentation talks about how slow standard jsonschema is, then something about "compiles definition" and then a benchmark of fastjsonschema. Where is the comparison benchmark?
Is it even possible?
Is there a way to iterate through multiple errors during JSON validation, currently it gives only one.
The url http://opensource.seznam.cz/python-fastjsonschema/ does not exists.
Validation fails with this example:
import json
import fastjsonschema
schema = {
"$schema": "http://json-schema.org/draft-07/schema#",
"$id": "https://example.com/example.schema.json",
"title": "Example",
"description": "An example schema",
"type": "object",
"properties": {
"date": {
"type": "string",
"description": "Some date",
"format": "date-time"
}
}
}
validate = fastjsonschema.compile(schema)
validate({"date": "2018-02-05T14:17:10Z"})
The output is:
Traceback (most recent call last):
File "validate_simple_example.py", line 22, in <module>
validate({"date": "2018-02-05T14:17:10Z"})
File "<string>", line 16, in validate_https___example_com_example_schema_json
fastjsonschema.exceptions.JsonSchemaException
According to the json schema docs all RFC 3339 timestamps should be valid.
I think the problem is the milliseconds part. It should be optional if I'm not wrong.
The above example runs fine with: validate({"date": "2018-02-05T14:17:10.00Z"})
The current regex is:
^\d{4}-[01]\d-[0-3]\dT[0-2]\d:[0-5]\d:[0-5]\d\.\d+(?:[+-][0-2]\d:[0-5]\d|Z)?$
I suggest changing it to:
^\d{4}-[01]\d-[0-3]\dT[0-2]\d:[0-5]\d:[0-5]\d(?:\.\d+)?(?:[+-][0-2]\d:[0-5]\d|Z)?$
Also maybe it's worth thinking about not using regexes for format validation for some of the stuff (like ips, dates, etc.)
This example falls, because of the indent
decorator removes second try
block
import fastjsonschema
spec = {
"$schema": "http://json-schema.org/draft-04/schema#",
"description": "Validation schema for OpenAPI Specification 3.0.X.",
"type": "object",
"required": [
"openapi",
"info",
"paths"
],
"properties": {
"components": {
"$ref": "#/definitions/SchemaXORContent"
}
},
"patternProperties": {
"^x-": {
}
},
"additionalProperties": False,
"definitions": {
"SchemaXORContent": {
"description": "Schema and content are mutually exclusive, at least one is required",
"not": {
"required": [
"schema",
"content"
]
},
"oneOf": [
{
"required": [
"schema"
]
},
{
"required": [
"content"
],
"description": "Some properties are not allowed if content is present",
"allOf": [
{
"not": {
"required": [
"style"
]
}
},
{
"not": {
"required": [
"explode"
]
}
}
]
}
]
}
}
}
validate = fastjsonschema.compile(spec)
Traceback (most recent call last):
File "fast.py", line 64, in <module>
validate = fastjsonschema.compile(spec)
File "lib/python3.7/site-packages/fastjsonschema/__init__.py", line 167, in compile
exec(code_generator.func_code, global_state)
File "<string>", line 42
except JsonSchemaException: pass
^
IndentationError: unindent does not match any outer indentation level
and the generated code:
def validate___definitions_schemaxorcontent(data):
data_one_of_count = 0
if data_one_of_count < 2:
try:
data_is_dict = isinstance(data, dict)
if data_is_dict:
data_len = len(data)
if not all(prop in data for prop in ['schema']):
raise JsonSchemaException("data must contain ['schema'] properties", value=data, name="data", definition={'required': ['schema']}, rule='required')
data_one_of_count += 1
except JsonSchemaException: pass
if data_one_of_count < 2:
try:
# try: should be here <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
data_is_dict = isinstance(data, dict)
if data_is_dict:
data_len = len(data)
if not all(prop in data for prop in ['style']):
raise JsonSchemaException("data must contain ['style'] properties", value=data, name="data", definition={'required': ['style']}, rule='required')
except JsonSchemaException: pass
else:
raise JsonSchemaException("data must not be valid by not definition", value=data, name="data", definition={'not': {'required': ['style']}}, rule='not')
try:
data_is_dict = isinstance(data, dict)
if data_is_dict:
data_len = len(data)
if not all(prop in data for prop in ['explode']):
raise JsonSchemaException("data must contain ['explode'] properties", value=data, name="data", definition={'required': ['explode']}, rule='required')
except JsonSchemaException: pass
else:
raise JsonSchemaException("data must not be valid by not definition", value=data, name="data", definition={'not': {'required': ['explode']}}, rule='not')
data_is_dict = isinstance(data, dict)
if data_is_dict:
data_len = len(data)
if not all(prop in data for prop in ['content']):
raise JsonSchemaException("data must contain ['content'] properties", value=data, name="data", definition={'required': ['content'], 'description': 'Some properties are not allowed if content is present', 'allOf': [{'not': {'required': ['style']}}, {'not': {'required': ['explode']}}]}, rule='required')
data_one_of_count += 1
except JsonSchemaException: pass
if data_one_of_count != 1:
raise JsonSchemaException("data must be valid exactly by one of oneOf definition", value=data, name="data", definition={'description': 'Schema and content are mutually exclusive, at least one is required', 'not': {'required': ['schema', 'content']}, 'oneOf': [{'required': ['schema']}, {'required': ['content'], 'description': 'Some properties are not allowed if content is present', 'allOf': [{'not': {'required': ['style']}}, {'not': {'required': ['explode']}}]}]}, rule='oneOf')
try:
data_is_dict = isinstance(data, dict)
if data_is_dict:
data_len = len(data)
if not all(prop in data for prop in ['schema', 'content']):
raise JsonSchemaException("data must contain ['schema', 'content'] properties", value=data, name="data", definition={'required': ['schema', 'content']}, rule='required')
except JsonSchemaException: pass
else:
raise JsonSchemaException("data must not be valid by not definition", value=data, name="data", definition={'description': 'Schema and content are mutually exclusive, at least one is required', 'not': {'required': ['schema', 'content']}, 'oneOf': [{'required': ['schema']}, {'required': ['content'], 'description': 'Some properties are not allowed if content is present', 'allOf': [{'not': {'required': ['style']}}, {'not': {'required': ['explode']}}]}]}, rule='not')
return data
anyOf
and oneOf
seem to be ignored in this example, it only tests the first schema in the array. Other validators don't give any error on it. It doesn't seem to matter the criteria used to evaluate the value of the property, const
and pattern
give the same result.
Schema
{
"anyOf": [
{
"properties": {
"a": {"const": "a"},
"b": {"const": "a"}
}
},
{
"properties": {
"a": {"const": "b"},
"b": {"const": "b"}
}
}
]
}
Example A: validated correctly
{
"a": "a",
"b": "a"
}
Example B: unexpected error
{
"a": "b",
"b": "b"
}
Python: 3.7.3 (64 bits)
fastjsonschema: 2.13
I have a schema with something similar to the following in it.
{
"$schema": "http://json-schema.org/draft-04/schema#",
"type": "object",
"properties": {
"object_key": {
"title": "My Object Entry",
"description": "An object entry that must always have a value of 'foo'.",
"type": "string",
"enum":[
"foo"
]
}
}
This will match against a key called objectkey
but reject one called object_key
. I'm also seeing a schema with a key @_ns
match against a document containing ns
but not @_ns
.
For the same reason, the examples at https://spacetelescope.github.io/understanding-json-schema/reference/object.html#properties will not work because of the property street_name
.
It would be useful (I would say even necessary) to put more information in JsonSchemaException. Message generated by the library is rarely fit to be displayed to the end user, and there is also a matter or i18n. It should be possible to identify the part of the document that was deemed invalid, corresponding part of the schema and the name of validation rule which raised the exception.
In Python 3.7 with warnings enabled, fastjsonschema.compile()
(or rather the code generated by it) emits DeprecationWarning
s when using pattern
or patternProperties
with some regexps:
$ python3 --version
Python 3.7.0
$ python3 -Wall -c 'import fastjsonschema; fastjsonschema.compile({"pattern": r"\d"})'
<string>:6: DeprecationWarning: invalid escape sequence \d
$ python3 -Wall -c 'import fastjsonschema; fastjsonschema.compile({"patternProperties": {r"\d":{}}})'
<string>:8: DeprecationWarning: invalid escape sequence \d
It would be good if the generated code was warning-free on Python 3.7. (In our specific case, this warning pollutes our test suite output.)
I can probably prepare a patch, this looks like a simple escaping issue.
Aaand another one. Hope you aren't getting annoyed. Running this example throws a JsonSchemaException:
import fastjsonschema
schema = {
"$schema": "http://json-schema.org/draft-07/schema#",
"$id": "https://example.com/example.schema.json",
"title": "Example",
"description": "An example schema",
"type": "object",
"properties": {
"host": {
"type": "string",
"description": "Some hostname",
"format": "hostname"
}
}
}
validate = fastjsonschema.compile(schema)
validate({"host": "example.de"})
For 'google.com' it works fine. I'll create a PR soon, probably.
if
seem to act the same as if you moved its contents outside, any invalid schema gives an error insted of being ignored and exclusively being used for the then
and else
clauses. Other validators don't give any error on it. It doesn't seem to matter the criteria used to evaluate the value, const
and pattern
give the same result.
Schema
{
"if": {
"const": "a"
}
}
Example A: validated correctly
"a"
Example B (any valid json except "a"
): unexpected error
""
Python: 3.7.3 (64 bits)
fastjsonschema: 2.13
import fastjsonschema
schema_file = open(schema.json)
schema_str = schema_file.read()
validatation = fastjsonschema.compile(schema_str)
{
"$schema": "http://json-schema.org/draft-04/schema#",
"title": "organization",
"description": "JSON schema",
"type": "object",
"properties": {
"transactionDetail": {
"id": "https://example.com/transactionDetail",
"type": "object",
"properties": {
"transactionID": {
"description": "A number assigned by the calling application to uniquely identify this request.",
"type": "string"
},
"transactionTimestamp": {
"description": "The date and time when this request was submitted.",
"type": "string"
}
},
"required": [
"transactionID"
],
"additionalProperties": false
},
"organization": {
"$ref": "#/definitions/organization"
}
},
"additionalProperties": false,
"definitions": {
"organization": {
"type": "object",
"properties": {
"identifier": {
"description": "identification number.",
"type": "string",
"minLength": 1,
"maxLength": 12
},
"countryCode": {
"description": "The two-letter country code.",
"type": "string",
"minLength": 2,
"maxLength": 2
},
"timestamp": {
"description": "The date and time that the record was created.",
"type": "string"
},
},
"required": [
"identifier",
"countryCode"
],
"additionalProperties": false
}
}
}
Traceback (most recent call last):
File "", line 1, in
File "/dnbusr1/z-gda-pad/anaconda3/lib/python3.6/site-packages/fastjsonschema/init.py", line 153, in compile
resolver, code_generator = _factory(definition, handlers)
File "/dnbusr1/z-gda-pad/anaconda3/lib/python3.6/site-packages/fastjsonschema/init.py", line 193, in _factory
resolver = RefResolver.from_schema(definition, handlers=handlers)
File "/dnbusr1/z-gda-pad/anaconda3/lib/python3.6/site-packages/fastjsonschema/ref_resolver.py", line 89, in from_schema
**kwargs
File "/dnbusr1/z-gda-pad/anaconda3/lib/python3.6/site-packages/fastjsonschema/ref_resolver.py", line 78, in init
self.walk(schema)
File "/dnbusr1/z-gda-pad/anaconda3/lib/python3.6/site-packages/fastjsonschema/ref_resolver.py", line 148, in walk
elif '$ref' in node and isinstance(node['$ref'], str):
TypeError: string indices must be integers
In the jsonschema library there is a best_match() function to try and match the underlying error from a failure in the schema validation: https://python-jsonschema.readthedocs.io/en/stable/errors/#best-match-and-relevance
This is especially useful in schemas with oneOf
s in the schema because when there is an error in the payload being verified it results in the oneOf failing because none of the fields match. Which isn't normally useful when there is an error in an otherwise valid subschema causing the failure.
I've been working with using fastjsonschema on a particularly bad example of a schema for this case published here: https://qiskit.org/schemas/qobj_schema.json that uses large oneOf
s often nested and figuring out the cause of a validation error with fastjsonschema is very difficult because it's often a top level oneOf rule that fails and the rule definition is the entire contents under that oneOf (which can be quite large). So the error messages returned by the JsonSchemaException
doesn't really help for debugging, nor do any of the other exception parameters. I've had to either use best_match() from jsonschema on failure or if I have a known working example just looking at it side by side with the failure. Both of which are less than ideal.
It would be great enhancement to add some way of debugging these scenarios to the library so we can get the speed benefits of this library without having to sacrifice ease of debugging.
In the JSON_TYPE_TO_PYTHON_TYPE mapping, "string" does not include unicode type for Python2. It's not able to validate against JSON objects with unicode strings.
Also "number" and "integer" does not include type "long" in Python2.
We relied on undefined formats (format="base64", etc.). We use them to create custom transcoders that can encode/decode JSON (instead of formats, we use transcodes). Our code works with 2.13. FYI: New release no longer works. No big deal, b/c I can just add all my formatters with lamda: True to fix.... (or change the name of the key i use).
...
File "site-packages/fastjsonschema/__init__.py", line 165, in compile
File "site-packages/fastjsonschema/draft04.py", line 68, in global_state
File "site-packages/fastjsonschema/generator.py", line 74, in global_state
File "site-packages/fastjsonschema/generator.py", line 115, in _generate_func_code
File "vidamessage/schema/__init__.py", line 27, in generate_func_code
File "site-packages/fastjsonschema/generator.py", line 129, in generate_func_code
File "site-packages/fastjsonschema/generator.py", line 139, in generate_validation_function
File "site-packages/fastjsonschema/generator.py", line 152, in generate_func_code_block
File "site-packages/fastjsonschema/draft06.py", line 35, in _generate_func_code_block
File "site-packages/fastjsonschema/generator.py", line 170, in run_generate_functions
File "site-packages/fastjsonschema/draft04.py", line 462, in generate_properties
File "site-packages/fastjsonschema/generator.py", line 152, in generate_func_code_block
File "site-packages/fastjsonschema/draft06.py", line 35, in _generate_func_code_block
File "site-packages/fastjsonschema/generator.py", line 170, in run_generate_functions
File "site-packages/fastjsonschema/draft04.py", line 268, in generate_format
fastjsonschema.exceptions.JsonSchemaDefinitionException: Undefined format %s
Hi, pls. for this example:
import fastjsonschema
schema = {
'type': 'object',
'properties': {
'hash': {
'anyOf': [
{
'type': 'string',
'pattern': '^AAA'
},
{
'type': 'string',
'pattern': '^BBB'
}
]
}
}
}
validator = fastjsonschema.compile(schema)
data = {
'hash': 'AAAXXX',
}
validator(data)
Expected: data are valid against the schema.
Actual: fastjsonschema.exceptions.JsonSchemaException: data.hash must be valid by one of anyOf definition
Python version: 3.6
fastjsonschema version: 1.2
Looks like anyOf
together with pattern
matching is not working as expected.
I can't get the "const" keyword to work properly for string values:
In [12]: fastjsonschema.compile({"const": "a"})("a")
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-12-ab19503ee02c> in <module>()
----> 1 fastjsonschema.compile({"const": "a"})("a")
<string> in validate(data)
NameError: name 'a' is not defined
Looking the code produced by the compile_to_code
function, it seems that the value "a" is just not properly quoted:
VERSION = "2.3"
from fastjsonschema import JsonSchemaException
NoneType = type(None)
def validate(data):
if data != a:
raise JsonSchemaException("data must be same as const definition")
return data
Indeed, it works for values that don't require quoting, like integers.
Hi there!
I am trying to use this library for schema validation but I am finding that it compiles schemas that don't adhere to the standard for JSON schemas.
For example, the following code:
fastjsonschema.compile({"title" : "test", "type": "object", "properties": {"test": ["THING"]}})
compiles with the latest version of fastjsonschema on Pypi however it is not a valid JSON schema. Does this library check to see if schemas are valid before compiling them?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.