Comments (31)
Hi. This is a old lib and i have not worked on it for a while as you can see.
Maybe what i have written in the readme is abit harsh but what i mean with that it never will be supported is that I belive that the python community should push towards a python3 and stop using python2 and what i mean is that I will not personally code this repo to support both python2 and python3. My intentions when i first coded this library was to only use it my own python 3 only enviroment and that is my i wrote that python2 would never be supported.
But with that said if it happens that my code will support python2 then that is good and then i can rewrite the readme to say that it works but that I personally will not actively maintain a python2 compatible code.
If you like to ensure/fix a python2 compatible code for this repo then you are welcome to submit a patch and i will merge it and give you the proper credit for the python2 support :]
// Grokzen
from pykwalify.
I too would rather leave 2.X behind but unfortunately I have some users that are still stuck on 2.X and can't move forward. I found one other little hiccup with opening/reading/parsing a configuration file in init.py:init_logging(). Apparently open() in 2.X returns a string whereas StringIO expects a unicode string. Replaced the built-in open() call to use codecs.open(p, "r", "utf-8") so that a unicode representation of the logging.ini file is guaranteed to be returned regardless of the Python version. Seems to be happy with that. I'll continue to do more testing and at some point I'll likely send you a merge request for your review.
I appreciate you putting this out there with a friendly license. It looks pretty solid. Hopefully I'll be able to use it in a project I'm working on.
from pykwalify.
I am happy you like it :] I maybe will give it some love during next week to update some things i have had in my backlog for a while.
from pykwalify.
I've found another 2.7.x vs. 3.x issue related to calling the base-class of PyKwalify errors but I'm working through it at the moment. I was hoping you might be able to show me how to formulate a schema to validate the following YAML:
- mic:
name: input
bits: 16 - media:
name: output
bits: 32
The problem I have is within the sequence, I don't know beforehand, the valid names for "mic" and "media". I want to accept an arbitrary key name (e.g. mic/media) but can't figure out how to create such a schema. My best guess is the schema might look something like:
type: seq
sequence:
- type: map
required: True
mapping:
pattern: [s+] <- I want to wild-card the name here (e.g. mic or media)
type: map
required: True
mapping:
name:
type: str
bits:
type: int
So it's not clear how to write a schema to validate a sequence of single element maps (with a map for a key value) that has an arbitrary key (in this case mic or media). Can your schema validator accommodate such an YAML encoding? It appears that beforehand you have to know with absolute certainty the names of all the valid map keys. I just want to validate that these keys match a certain pattern (or even no pattern).
from pykwalify.
Could you use correct github flavored markdown to get the indentation preserved correct? (https://help.github.com/articles/github-flavored-markdown#syntax-highlighting) because it is hard to restructure your YAML manually.
Do you mean??
- mic:
name: input
bits: 16
- media:
name: output
bits: 32
from pykwalify.
Yes, sorry, that's what I intended. It's a sequence of maps with keys: mic and media. The key point is that I don't want to have to explicitly provide the "mapping" name for 'mic' and 'media' . . . just indicate that these are keys and their value is another map. This "sub-map" is, however, defined. From what I can tell, pykwalify can't describe this use-case since it expects a "mapping" which includes the key name(s) for every map. Does this make sense?
from pykwalify.
I have pushed some changes that will fix some of the issues you have found here. The logging should be cleaned up and work with py2.7 and 3.2+ now. I also dropped support for python 3.1 because that is right now deprecated and it made the logging handling alot easier. It do not work for with python 2.7 yet because of a bug when calling super classes, the same one you found, and i will fix that shortly.
from pykwalify.
Thanks, I'll try to merge your changes with what I'm currently working on. I already fixed the super class calls by using the old-school method of calling the base clase, e.g. PyKwalifyException.init(self, ...). I believe this is compatible with the 3.x super() mechanism.
Did you have a chance to consider my use-case above . . . e.g. accept any name for the map keys 'mic' and 'media' but then strictly validate the sub-map associated with these keys. It appears the schema language has no mechanism to address this construction. Is that interpretation correct?
BTW - thanks for any bug-fixes and/or new features you introduce in the process of providing this support.
from pykwalify.
I think i got your usecase to work by fixing a bug in map where is a regex is specefied it was still trying to do a lookup for all keys if they had a specific rule. With that i could construct this rules for the following data. It uses a regex to match all keys and you could then construct some regex to work with either a subset of cases or match any key like i have. It feels not right and i have to figure out if i can do something else in the future to make this easier and more intuitive.
- mic:
name: input
bits: 16
- media:
name: output
bits: 32
type: seq
sequence:
- type: map
pattern: ".+"
mapping:
name:
type: str
bits:
type: str
Try it and see if that is what you are looking for :] If not just let me know.
from pykwalify.
I tried your patch and it appears to work, but as you indicated, it doesn't feel right or necessarily intuitive. This seems to work if the map key references another map (so the "mapping" command applies) but what would happen if the map key's associated value were a sequence instead of a map such as:
- mic:
- input
- output
- media:
- input
- output
How might it support arbitrary nested types? The "mapping" command above actually applies to the sub-type (e.g nested type) rather than the 'mic' or 'media' map itself. It almost seems like I would expect to see the schema look something like this:
type: seq
sequence:
- type: map
pattern: ".+"
type: map
mapping:
name:
type: str
bits:
type: str
In essence, the parent map has no explicit mapping that we're interested in. Instead, we're interested in the sub-type (which happens to be another map). The 'pattern' allows you to identify the key in the map in which to apply the sub-type rule. Does this make sense?
from pykwalify.
I think you formatted your second yaml blob there abit wrong because that is not valid YAML :] Could you edit that so that it is valid?
from pykwalify.
I edited the YAML above . . . was simply missing a ":" after media. It should now be valid YAML.
from pykwalify.
nonono -_- this one
type: seq
sequence:
- type: map
pattern: ".+"
type: map
mapping:
name:
type: str
bits:
type: str
That is not valid yaml.
Your case that you bring up is a valid concern and i dont think that the lib currently support a case like this:
- mic:
- input
foo
- output
bar
- media:
- input
opa
- output
lopa
Where the value for mic & media is a list but you need a map validation when parsing that. Map was according to the original specification not supposed to handle that case.
The solution to this i think might be something like this
type: seq
sequence:
- type: map
pattern: ".+"
custom_mapping_type: True
mapping:
type: seq
sequence:
- type: str
Where you can override the default behaviour of map validation and to avoid the normal mapping rules and use either a custom or another type of builtin validation like seq.
from pykwalify.
The verbose solution that will work in this very special case is the following:
type: seq
sequence:
- type: map
mapping:
mic:
type: seq
sequence:
- type: str
media:
type: seq
sequence:
- type: str
The problem with that solution is that mic & media is hardcoded but that works.
I have to do some more thinking how to extend map to cover alot of other cases including this one in a better way.
from pykwalify.
Yes, I got it to work as well as long as mic and media are known ahead of time and can be explicitly encoded in the schema. I had a YAML configuration where these names could vary and I could not figure out how to generically encode a schema which would validate the overall structure of the configuration file while allowing wild-cards for the key name. The YAML schema's I've run across all want you to explicitly name the keys which is probably expected in a rigid schema description. I need/want a little more flexibility where I can just describe the expected 'type' structure rather than the explicit field names (e.g. expect a map but don't describe it any further . . . or maybe a map with one known key name but ignore the other keys for validation purposes). Basically, as it stands, the Kwalify schema notation doesn't appear to support arbitrary YAML encoding but instead a strict subset where all the types/fields are known. You can't apply the schema to "part" of the YAML. I suspect this is by design but prevents it from accommodating less rigidly defined structures.
from pykwalify.
Okey from what i see it could be possible to implement a few new things.
This schema could possibly solve the issue you described before where a key in a map should be a list with some internal mapping rules.
type: seq
sequence:
- type: map
pattern: ".+"
mapping:
- sequence:
- type: str
Or the internal mapping handling could be extended to have alot more flexability with something like this:
type: seq
sequence:
- type: map
matching-rule: none/one/any/all
mapping:
regex;[mic.+]:
- type: seq
sequence:
- type: bool
media:
type: int
foobar:
type: map
mapping:
opa:
type: bool
This should give more flexibility to define multiple rules inside a map with multiple regex rules combined with exact matching keys. This would remove the limit that when specifying regex at the same level as mapping, that regex is global for the entire map. matching-rule option could give some more options to match any number of rules if you run with multiple rules. With this mapping-rule option you could use "any" so that keys that do not match any rule is ignored, but if they match they must be validated.
And this could be used to validate a schema like this where the internal data could be different in mic, media & foobar.
- mic:
- input
foo
- output
bar
- media:
- input
opa
- output
lopa
- foobar:
opa: True
I think that with this new ability you could construct a schema that more or less ignore the key names and is more centered around the structure of the data and not the explicity keynames match or not. You could do this by using regex;[.+] everywhere and match any key and continue to validate the subrules.
from pykwalify.
I like the later approach you described that has the ability to (optionally) validate the structure (vs. the key names) while still giving you the ability to do both with judicious use of regex patterns. Seems like that would be a great extension which the Ruby implementation of Kwalify seems to lack. Would this be hard to implement in your design?
from pykwalify.
It should not be impossible to implement but this would add alot more complexity and more paths that must be handled by map. If all goes well then maybe could be done tomorrow or the day after that. I have to do alot of new test files that will ensure that this work correctly.
from pykwalify.
No complaints from me with regards to your estimated turn-around time to implement such a feature. I'll be interested in trying it out once it's completed. Any chance you could tweak your Error sub-classing (e.g. avoid using the super() method to call the base-class) so that your development branch runs under 2.7.X while you're at it? I could abandon my fork at that point.
from pykwalify.
Ye I can look into that also
from pykwalify.
Well that was easier then i thought :]
Have a look at c69a7b3 and 30a.yaml and 30b.yaml file to see an example implementation. I will create more testfiles to ensure functionality but this is atleast a start where you can test if it works or not.
from pykwalify.
This 2569a84 should fix the python 2.7 issue and you can look at TravisCI if the tests for python 2.7 is passing or not :]
from pykwalify.
Looks like it runs fine on Python 2.7.3 for me. Your 30a.yaml test case worked as well. I'll have to play with in on my application. I'm a little unclear on the "regex;[mi.+]" notation in 30b.yaml. Strange to see the semi-colon and is it safe to assume the pattern is inside the brackets ([])? So this means regex;[mo.+] doesn't match any of the keys in the example? But then I'm a little confused for
regex;[mi.+]:
type: seq
sequence:
-type: bool
since it doesn't seem to match the mic value either (I see no bool with either input/output elements). Am I missing something?
from pykwalify.
I used ; to separate the keyword regex and the regex to be used. I could not do it any easier way right now because the child map to the key regex is used to validate further down the chain. It probably could be possible to to something else where you can define the regex to be used inside the value dict but that will take some work to change the code.
Yes the text inside [ ] is the regex that is going to be used. I do not like how it turned out and i probably will remove them later because they server no function other then to make the regex stick out when reading the schema.
Hmm if the regex do not validate the child items then i have to look into that abit closer to see where the problem is.
from pykwalify.
This commit should have fixed the regex problems b59b641
It was not correctly validating sub items so the test file passed because of that even if that was wrong. I think i have to implement some sort of tracking system into the core so that it can be validated that rules is applied and works correctly.
from pykwalify.
Have you ever considered a mechanism to include or import a YAML schema file into another schema. I find myself wanting to build a larger schema that is an aggregate of other schema definitions. I've used this YAML trick to define a subset of a schema that can be used elsewhere (in the same file) in certain circumstances, e.g.
type: map
required: False
mapping:
position: &position
type: map
mapping:
x:
type: int
y:
type: int
type: map
required: True
mapping:
polygon:
type: seq
sequence:
- *position
By creating a psuedo dummy map (e.g. required: False) and tagging the node (position) I can include it in another schema. This works, although it would probably be better if I could arbitrary create a named schema and include it elsewhere in another schema using a dedicated keyword, e.g.
schema: &position
type: map
mapping:
x:
type: int
y:
type: int
type: map
required: True
mapping:
polygon:
type: seq
sequence:
- *position
I still. however, can't collect these smaller schemas and include them in a larger schema via a file. Also, it's not possible to append a keyword to an existing schema "fragment" (e.g. like "required: True" if you wanted to change whether or not a schema fragment is required) or similar tweaks. Have you ever considered such a feature. I think it would be a very powerful mechanism to decompose an schema into it's constituent parts so that they can be re-used in larger aggregated schemas. Less copy-pasting code chunks that you may have to later change in multiple locations. Your thoughts?
from pykwalify.
I have never thought that you could use YAML that way and create partials and include them elsewhere, cool :]
I like the idea and here is how i think it should be implemented
- schema: can be the new keyword for this but because i wont use any native YAML abilities i have to add a some new keyword inside schema: like schema-name: to enable name tagging.
- schema: can only be specefied at top level of a file for easy pre parsing of all partials before any other rule is parsed.
- I dunno yet how to fix partial inside partial because that can lead to infinite recursion problems if done wrong. I have to work abit on this to get it right.
- schema can easily be parsed as a Rule() so not much new has to be done there.
- A new keyword have to be created for when to include a partial. An easy one would be "include-schema" and that can only be used at the same place/level where the keyword "type" can be used today. Will see how that works but it should work out i think.
- It should be possible to specify more then 1 schema file via cli via multiple "-s schema.yaml -s schema2.yaml ..." to give the option to have a file with predefined partials allready.
- Overriding schema partials should not be possible, if defined in 2 files then error will be thrown.
- I do not think that currently it should be possible to mix a partial with some other defined rule, for example a partial cannot be a map and that should be merged with another map just to extend the ammount of keys that is possible to validate. This would be a future improvment but not in the first itteration because there is some other things that i have to change in the validation process for that to work good.
What you think?
from pykwalify.
That sounds like a pretty reasonable approach. So the the only way to "include" another schema file is from the cli with an additional '-s' option, right? I assume there is a programmatic way of doing this as well. Could there be a way (or keyword) to include a schema file from the file-system as well (like an "include/import" keyword) in the script itself. I guess once you load the base schema you'd have to search is for those "include" keyboards in the document and merge in the resulting schema fragments. Perhaps that's what you're trying to avoid?
from pykwalify.
I do not think that i will make an option to directly include another file from inside a schema like this
type: map
required: True
mapping:
polygon:
type: seq
sequence:
- include-schema:
file: /foo/bar.yaml
Maybe in a second itteration of the feature but not in the first one.
Yes both ways of including another file will be supported, via cli -s and as a argument to Core() as a list of multiple source_files that is parsed in sequence.
One problem that i thought of tho is that schema: as a keyword will not work because then it will only be possible to have one schema: key per file. The two solutions is to first have a list that then contains all partials but that will make it harder to integrate them into the same file as the main schema that wants to include the partial. The second solution that i am thinking of is to use the same feature as regex; and do "schema;schema-id:" because then it is possible to have any number of schema partials in one file. Then you could do
schema;seq:
type: seq
sequence:
- type: int
schema;map
type: map
mapping:
x:
type: int
type: map
mapping:
polygon:
include-schema:
id: map
square:
include-schema:
id: seq
And with this setup then it could be possible to have all defenitions in the same file or you could split-up all partials into one file and the main schema in a separate file.
from pykwalify.
Also i will close this thread because the original issue is resolved and this thread is getting long :]
Please make a new issue for new features and stuff you think of so that i can close them faster when they are done. I will move this latest feature into a new issue so we can talk about it there and not here.
from pykwalify.
Definitely would want to be able to use multiple partials from the same file (e.g. your regex approach). And my likely use case is a primary schema file that pulls in others to complete a schema definition. Easier to compose a larger schema built out of smaller (and better tested) parts.
from pykwalify.
Related Issues (20)
- Union of schemas HOT 4
- rule.py is missing a comma in defined_keywords list
- Deprecation warning due to invalid escape sequences in Python 3.8 HOT 1
- Fails to regex match a mapping with integer keys HOT 1
- AttributeError: 'int' object has no attribute 'startswith'
- Support multiple data type HOT 1
- New release? Maintainers needed or? HOT 2
- Deprecation warning "the imp module is deprecated in favour of importlib" HOT 1
- pattern rule not working with all scalar types HOT 1
- <RuleError: error code 4
- Using fulmatch for regex matching. HOT 1
- Schema Validation Against Multiple Files HOT 1
- Outputting Scan Results in SARIF Format HOT 1
- How can I check key only? HOT 5
- anchors to inherit properties are falsely reported as Duplicate HOT 2
- ReadTheDocs does not seem to be updating
- Some error message for length violations are wrong
- issue while using "unique: true" for the sequence data type HOT 3
- Tests fail with ruamel.yaml 0.18.0
- Core Validation Fails - Multi Threading & Python 3.11 HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pykwalify.