Comments (9)
The only one that i might consider to implement is the first suggestion. The main reason is that i do not like to open up the mix of data and schema in the same file, i think it is very messy. The best thing about the first option is that when there is a pointer to the resource, it can be implemented in a way that opens up for any type of location of that resource. It would make sense to implement the standard patterns of file://
git://
ftp://
and all of them. It would also open up for the implementer of the code to implement their own handlers in case the default one do not exists yet in the lib.
The bad thing is that the python yaml parser do not support out of the box to get the global tags after the data have been loaded. But on the other side, it is very easy to just parse the file and take the first %SCHEMA
pattern and use it.
from pykwalify.
I agree with you, I also prefer the first option while the %SCHEMA
may be not recognized by all parsers. That said it is not a forbidden tag so I guess we are free to use it.
From the YAML specs the two recognized %
commands are %YAML
and %TAG
. So we could also use something like this:
%TAG !schema! file:///usr/local/share/foo.yml
Last but not least, using the word %SCHEMA
is perhaps a bit pretentious in the way it says PyKwalify is the default (and will become the) standard YAML validator. I see this as a very good thing but some will not I guess.
Next step would be to add this validation support to the PyYAML module...
from pykwalify.
I am still more in favor of the initial suggestion of
%SCHEMA file:///usr/foo.yaml
because it is more clean and easier to use. Atleast pyyaml do not throw up if i add %SCHEMA
at the top of the file but at the same time i understand that the specs is more in favor of %TAG !schema! ...
. But on the other hand so did i find this in the spec
Directives are instructions to the YAML processor. This specification defines two directives, “YAML” and “TAG”, and reserves all other directives for future use. There is no way to define private directives. This is intentional.
So that any tag can be implemented and the spec says that it should be compatible so if any other client is not compatible with a custom Tag then that one is broken, not pykwalify use of it :]
from pykwalify.
So let's choose %SCHEMA
and make a new standard for %YAML 1.3
that all the World will use !
The next step will be to extend PyYAML
to support PyKwalify
...
from pykwalify.
And after that, world domination :]
But %SCHEMA ...
it will be, i think that initially only file://
will be supported out of the box but a plugin type of system shall also be added so that it can be extended to support other formats in the future.
from pykwalify.
I would like to work on this implementation, but I need more inputs. What do we decide? Does it worth to inform maintainers of the YAML standard?
from pykwalify.
xsi:schemaLocation
always has been kind of a dirty hack. Normally, you specify the schema URI in XML with xmlns
and then have the application that takes the XML as input provide the schema file.
This is even more true with YAML: You specify the type of the top element and all other elements that will not be resolved automatically to the correct type as a tag:
%YAML 1.2
--- !my:data:schema
some: data
...
Since YAML is designed to be deserialized into values native to the implementation language, it makes little sense to define a schema language for it. The native types the loader transforms the YAML into are the schema. In PyYAML, for example, you can derive from YAMLObject
in order to create a schema for you YAML file.
Now I am aware that this project has created a schema language nonetheless. Which is fine. But I think it would be a big mistake to implement somthing akin to xsi:schemaLocation
in YAML because YAML is designed to be portable, and that would be greatly harmed if you start to reference a local file within it (how would you send that file along in some kind of network stream? YAML is designed to work well within streaming environments, which a schema location directive would totally destroy).
However, is an explicit tag at the root element not enough for a validator to search for the relevant schema? Given the YAML above, the loader could say „oh, I know this tag, I have a schema for it“ and then validate against that schema. It is even possible to only parse parts of the YAML against a schema:
%YAML 1.2
---
some: data
key with typed value: !my:data:schema
some: more
data: here
...
According to the YAML spec, the top value (a mapping) will implicitly get the !!map
tag, which allows all elements as content. Then, the loader sees a tag on one of the values in it, and can validate that subtree according to some schema.
The only difference to the approaches discussed here is that there needs to be a mapping from tags to schema files outside the YAML. Which I think is fine; if you use XML within an application, you also ship the schema with the application and do not search for it in xsi:schemaLocation
. That is little more than a hint for XML editors, but most of them are also able to define an URI -> schema file mapping in their configuration. So xsi:schemaLocation
is a superfluous alien meta information in your XML file that actually does not belong there and harms portability. I for one would like to avoid carrying that mistake over to YAML.
from pykwalify.
@flyx @nowox I think i will postpone implementing this feature for now.
@flyx I agree with you that taking this over to the PyYaml/ruamel.yaml or even the YAML org itself is not the best idea.This validation language is not the "one and only and best yaml validation language" and i do not intend or have any motivation to bring it to that level.
Another thing that has been bothering me about this feature is the security around it all. Say that we would implement some fetching from a http or git source for example, then we eitehr have to sandbox it very good to not escape and do bad things to your system by making it download something that you do not intent or that can cause harm to your system.
This is kinda a problem factor for extensions but the main difference there is that you can't through the data or schema definition tell pykwalify to download something from a source and then execute it. I might even back off on the entire feature just based on this security implications. It is then better that you implement this kind of feature outside of pykwalify and just keep pykwalify as is where you must specify a schema and data explicitly.
from pykwalify.
I do not plan to implement any of this. The feature seems to far out to really be usefull right now.
from pykwalify.
Related Issues (20)
- Union of schemas HOT 4
- rule.py is missing a comma in defined_keywords list
- Deprecation warning due to invalid escape sequences in Python 3.8 HOT 1
- Fails to regex match a mapping with integer keys HOT 1
- AttributeError: 'int' object has no attribute 'startswith'
- Support multiple data type HOT 1
- New release? Maintainers needed or? HOT 2
- Deprecation warning "the imp module is deprecated in favour of importlib" HOT 1
- pattern rule not working with all scalar types HOT 1
- <RuleError: error code 4
- Using fulmatch for regex matching. HOT 1
- Schema Validation Against Multiple Files HOT 1
- Outputting Scan Results in SARIF Format HOT 1
- How can I check key only? HOT 5
- anchors to inherit properties are falsely reported as Duplicate HOT 2
- ReadTheDocs does not seem to be updating
- Some error message for length violations are wrong
- issue while using "unique: true" for the sequence data type HOT 3
- Tests fail with ruamel.yaml 0.18.0
- Core Validation Fails - Multi Threading & Python 3.11 HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pykwalify.