Giter VIP home page Giter VIP logo

Comments (9)

Grokzen avatar Grokzen commented on June 23, 2024

The only one that i might consider to implement is the first suggestion. The main reason is that i do not like to open up the mix of data and schema in the same file, i think it is very messy. The best thing about the first option is that when there is a pointer to the resource, it can be implemented in a way that opens up for any type of location of that resource. It would make sense to implement the standard patterns of file:// git:// ftp:// and all of them. It would also open up for the implementer of the code to implement their own handlers in case the default one do not exists yet in the lib.

The bad thing is that the python yaml parser do not support out of the box to get the global tags after the data have been loaded. But on the other side, it is very easy to just parse the file and take the first %SCHEMA pattern and use it.

from pykwalify.

nowox avatar nowox commented on June 23, 2024

I agree with you, I also prefer the first option while the %SCHEMA may be not recognized by all parsers. That said it is not a forbidden tag so I guess we are free to use it.

From the YAML specs the two recognized % commands are %YAML and %TAG. So we could also use something like this:

%TAG !schema! file:///usr/local/share/foo.yml

Last but not least, using the word %SCHEMA is perhaps a bit pretentious in the way it says PyKwalify is the default (and will become the) standard YAML validator. I see this as a very good thing but some will not I guess.

Next step would be to add this validation support to the PyYAML module...

from pykwalify.

Grokzen avatar Grokzen commented on June 23, 2024

I am still more in favor of the initial suggestion of

%SCHEMA file:///usr/foo.yaml

because it is more clean and easier to use. Atleast pyyaml do not throw up if i add %SCHEMA at the top of the file but at the same time i understand that the specs is more in favor of %TAG !schema! .... But on the other hand so did i find this in the spec

Directives are instructions to the YAML processor. This specification defines two directives, “YAML” and “TAG”, and reserves all other directives for future use. There is no way to define private directives. This is intentional.

So that any tag can be implemented and the spec says that it should be compatible so if any other client is not compatible with a custom Tag then that one is broken, not pykwalify use of it :]

from pykwalify.

nowox avatar nowox commented on June 23, 2024

So let's choose %SCHEMA and make a new standard for %YAML 1.3 that all the World will use !

The next step will be to extend PyYAML to support PyKwalify...

from pykwalify.

Grokzen avatar Grokzen commented on June 23, 2024

And after that, world domination :]

worlddomination

But %SCHEMA ... it will be, i think that initially only file:// will be supported out of the box but a plugin type of system shall also be added so that it can be extended to support other formats in the future.

from pykwalify.

nowox avatar nowox commented on June 23, 2024

I would like to work on this implementation, but I need more inputs. What do we decide? Does it worth to inform maintainers of the YAML standard?

from pykwalify.

flyx avatar flyx commented on June 23, 2024

xsi:schemaLocation always has been kind of a dirty hack. Normally, you specify the schema URI in XML with xmlns and then have the application that takes the XML as input provide the schema file.

This is even more true with YAML: You specify the type of the top element and all other elements that will not be resolved automatically to the correct type as a tag:

%YAML 1.2
--- !my:data:schema
some: data
...

Since YAML is designed to be deserialized into values native to the implementation language, it makes little sense to define a schema language for it. The native types the loader transforms the YAML into are the schema. In PyYAML, for example, you can derive from YAMLObject in order to create a schema for you YAML file.

Now I am aware that this project has created a schema language nonetheless. Which is fine. But I think it would be a big mistake to implement somthing akin to xsi:schemaLocation in YAML because YAML is designed to be portable, and that would be greatly harmed if you start to reference a local file within it (how would you send that file along in some kind of network stream? YAML is designed to work well within streaming environments, which a schema location directive would totally destroy).

However, is an explicit tag at the root element not enough for a validator to search for the relevant schema? Given the YAML above, the loader could say „oh, I know this tag, I have a schema for it“ and then validate against that schema. It is even possible to only parse parts of the YAML against a schema:

%YAML 1.2
---
some: data
key with typed value: !my:data:schema
    some: more
    data: here
...

According to the YAML spec, the top value (a mapping) will implicitly get the !!map tag, which allows all elements as content. Then, the loader sees a tag on one of the values in it, and can validate that subtree according to some schema.

The only difference to the approaches discussed here is that there needs to be a mapping from tags to schema files outside the YAML. Which I think is fine; if you use XML within an application, you also ship the schema with the application and do not search for it in xsi:schemaLocation. That is little more than a hint for XML editors, but most of them are also able to define an URI -> schema file mapping in their configuration. So xsi:schemaLocation is a superfluous alien meta information in your XML file that actually does not belong there and harms portability. I for one would like to avoid carrying that mistake over to YAML.

from pykwalify.

Grokzen avatar Grokzen commented on June 23, 2024

@flyx @nowox I think i will postpone implementing this feature for now.

@flyx I agree with you that taking this over to the PyYaml/ruamel.yaml or even the YAML org itself is not the best idea.This validation language is not the "one and only and best yaml validation language" and i do not intend or have any motivation to bring it to that level.

Another thing that has been bothering me about this feature is the security around it all. Say that we would implement some fetching from a http or git source for example, then we eitehr have to sandbox it very good to not escape and do bad things to your system by making it download something that you do not intent or that can cause harm to your system.

This is kinda a problem factor for extensions but the main difference there is that you can't through the data or schema definition tell pykwalify to download something from a source and then execute it. I might even back off on the entire feature just based on this security implications. It is then better that you implement this kind of feature outside of pykwalify and just keep pykwalify as is where you must specify a schema and data explicitly.

from pykwalify.

Grokzen avatar Grokzen commented on June 23, 2024

I do not plan to implement any of this. The feature seems to far out to really be usefull right now.

from pykwalify.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.