dev-cafe / parselglossy Goto Github PK
View Code? Open in Web Editor NEWGeneric input parsing library, speaking in tongues.
Home Page: https://parselglossy.readthedocs.io
License: MIT License
Generic input parsing library, speaking in tongues.
Home Page: https://parselglossy.readthedocs.io
License: MIT License
The following chain of dependencies currently does not work if neither bar
nor baz
is set:
keywords:
- name: foo
type: float
docstring: "Foo"
- name: bar
type: float
default: "user['foo']"
docstring: "Bar"
- name: baz
type: float
default: "user['bar']"
docstring: "Baz"
It seems baz
is first set equal to the string "user['foo']"
(which is a type mismatch), before bar
is set to the value of foo
. Related to issue #73.
Fails PRs on Travis and my laptop:
https://github.com/dev-cafe/parselglossy/blob/master/.travis.yml#L19
I think it's supposed to be --doctest-modules
.
Some programs allow different names to be used for the same keywod/section. For example, CONVTHR
might be just as valid as CONVERGENCE_THRESHOLD
(or something similar). Do we want to allow for this, and if yes how would it look? An alias
field accepting a list of aliases?
I was thinking about cases where a given keyword might be either a list or just a scalar, that is Union[int, List[int]]
(or similar) Do you think this is a good idea? I don't know exactly how it would look in the implementation.
Currently the following is allowed
Foo {
bar = true
}
Foo {
bar = false
}
and the last definition takes precedence. This should probably not be allowed.
We have:
__init__.py
defines various fields like __author__
and __maintainers__
and __credits__
which are not widely used, I think.AUTHORS.rst
, which is more widely recognized and could be used more formally for authorship recognition.author
variable in docs/conf.py
which Sphinx uses.It would be good to consolidate these, so we only need to update in one place. I personally favor using AUTHORS.rst
exclusively, only put __version__
and __all__
in __init__.py
, and do some regex magic in conf.py
to update the author
variable.
Generation of test cases can get a bit unwieldy, given how many different "axis" there are. Hypothesis can help quite a bit and we should refactor other tests to use it.
The following type error in the input file (getkw grammar)
some_int = a
correctly spits out error message
Actual (str) and declared (int) types do not match.
but if the leading character is a number
some_int = 1a
then there is a silent exit, and all input parameters defined below this point in the input file will not be parsed and thus get their default values. This means that the user will not get notified unless some of those lower keywords don't have defaults.
We need to prepare tutorials on how to use the library. We will need different tutorials ranging from basic to advanced uses:
Basic/getting started:
dict
-s.Intermediate:
Advanced:
Should we add the possibility to perform arbitrary operations on the input tree? At the moment we have callables for defaulting and predicate checking. The former can modify the input tree, but only to fill default values, the latter are only allowed to return booleans.
@stigrj has a really good use case of modifying an entry in the dictionary to a value with an entirely different type (reading in the molecule from a multiline string to a dictionary of lists)
We could add an actions
field to keywords a hook to run arbitrary operations after predicate checking. This sort of throws out of the window the whole type checking thing, but gives a lot of flexibility to the users of the library. I can see the argument against: parselglossy
parses and gives you an input tree, what you do afterwards is up to you and if you screw it up, it's on you.
The following is parsed by the grammar, but does not pass validation.
name<foo> {
... keywords ...
.... sections ...
}
This totally slipped through the cracks and I am unsure how to solve it.
The CLI would require all required input files to be provided and would assume no defaults.
Motivation:
Would require a bit more typing but this will be less of a problem for users of the lib than it is now for the developers during debugging.
We have dependencies for development (specified in Pipfile
), testing (installed with pip
on CI services, see the YAML files), and deployment (installed with pip install parselglossy
).
This is problematic as we can easily lose track of dependencies in this way.
It would be great to have the possibility to use keyword values as defaults for other related keywords. Something like this in the predicate syntax:
keywords:
- keyword: precision
type: float
default: 1.0e-6
- keyword: threshold
type: float
default: "value = 10 * input_dict['precision']"
In the discussion over #51 it emerged that filesystem paths should be allowed to have richer semantics than that allowed by simple strings.
The bot created this issue to inform you that pyup.io has been set up on this repo.
Once you have closed it, the bot will open pull requests for updates as soon as they are available.
With #42 I renamed documentation
to docstring
. @bast commented: "would someone who has never heard of docstrings
know what this is? Is documentation
more intuitive? Also docstrings
are really for documentation extracted out of source code and I am not sure whether the input blueprint counts as source code."
The only reason why I did the rename is that I consistently misspell "documentation" when coding and I think it's a bit too long of a name for a field.
Let us discuss and agree on something.
I think the grammar definitions should be part of the tutorials, not part of the core library.
I prepared and published v0.5.0, but the action was not triggered and thus no PyPI package uploaded.
This is what I tried:
$ poetry install -v
$ poetry run parselglossy --help
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/usr/lib/python3.8/importlib/__init__.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
File "<frozen importlib._bootstrap>", line 991, in _find_and_load
File "<frozen importlib._bootstrap>", line 961, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
File "<frozen importlib._bootstrap>", line 991, in _find_and_load
File "<frozen importlib._bootstrap>", line 973, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'parseglossy'
Then (not knowing poetry well enough) I created a new folder, new venv and pip installed from the source folder, then:
$ parselglossy --help
Traceback (most recent call last):
File "/home/user/tmp/foo/venv/bin/parselglossy", line 5, in <module>
from parseglossy.cli import cli
ModuleNotFoundError: No module named 'parseglossy'
type_matches([], 'List[T]')
is true for any underlying type T
. Should type_matches
throw on empty lists?
In good old getkw strings are accepted both with and without quotes
some_string = foo
another_string = 'bar'
yet_another_string = "baz"
but the getkw grammar in parselglossy fails to accept quoteless strings if they are part of a section:
some_string = foo # works
another_string = 'bar' # works
yet_another_string = "baz" # works
some_section {
some_string = foo # fails
another_string = 'bar' # works
yet_another_string "baz" # works
}
I think main
can be removed. At least it's the impression I get from Click's docs: http://click.palletsprojects.com/en/7.x/setuptools/#setuptools-integration
So first of all I am not sure why we use tox. For me this is just one more layer. I would use pytest directly. One motivation may be that it runs pytest and flake8 in one step.
Speaking of which I get many flake8 warnings when running locally and a non-zero exit but I don't see these on Travis which confuses me.
We took them out because they stalled for no apparent reason in #84.
#45 drops Python 3.5 support. The lack of dictionary ordering makes testing of API and CLI quite painful. Support could be restored by sorting keys when serializing JSON and/or by consistently using OrderedDict
. We need to asses how important it is to support 3.5
[Pasting here notes from our meeting last week]
This should be easy to fix with a try/except cast but I postponed this. This is the reason why currently test_validation
reference contains the value as string.
I was not aware of this, but apparently reading from/writing to JSON can mess up precision of floating point numbers. I don't really know how to test and prevent this...
When I click on "module index" I land on a nonexisting page: https://parselglossy.readthedocs.io/en/latest/py-modindex.html
Should it be a section in the README or in the documentation proper or both?
Check that the keywords
stanza has no sections
in the template.
The development packages in Pipfile
are in two classes:
Since it's unfair to inflict upon the world the latter, I need to figure out a way to keep the two things separate.
Follow-up from #20
It would be nice to have the validator not fail immediately, but rather keep going and prepare a report of all failures. The user might then get a chance to fix all failures at once, rather than going back and forth (fix one, rerun, fix next, rerun, etc)
I would :-)
Python 3.4 reached end of life yesterday:
https://devguide.python.org/#status-of-python-branches
Let's submit a paper to JOSS! I have a branch paper
where I wrote some notes: https://github.com/dev-cafe/parselglossy/blob/paper/paper/paper.md
Can we have circular dependencies in default and predicate callables? If not, we need to document why. If yes, we need a way to stop parsing with a meaningful error message.
Much as in #8: to show off the generality of our approach and be a drop-in replacement for the DALTON input parser.
The following template.yml
keywords:
- name: foo
type: int
docstring: foo
- name: bar
type: int
docstring: bar
- name: baz
type: int
docstring: baz
is documented using the CLI command parselglossy doc template.yml
to give:
**Keywords**
:foo: foo
**Type** ``int``
**Keywords**
:foo: foo
**Type** ``int``
:bar: bar
**Type** ``int``
**Keywords**
:foo: foo
**Type** ``int``
:bar: bar
**Type** ``int``
:baz: baz
**Type** ``int``
As pointed out by @stigrj it would be good to not force case-sensitivity upon the users. I see some options:
api.lex
(and expose it in the CLI). Something like --case upper
(--case lower
)Upon parsing coordinate sections such as those used when specifying atomic coordinates
$coords
He 0.0 0.0 0.0
$end
or solvation cavity spheres
$spheres
0.0 0.0 0.0 4.0
$end
the parser ignores all whitespace and newlines between $start
and the actual content, but does not ignore whitespace and newlines between the content and $end
. As a result, the following sections are not parsed identically:
$coords
He 0.0 0.0 0.0
$end
$coords
He 0.0 0.0 0.0
$end
$coords
He 0.0 0.0 0.0$end
These result in the following strings, respectively
"He 0.0 0.0 0.0\n"
"He 0.0 0.0 0.0\n "
"He 0.0 0.0 0.0"
The expected output for all is (at least to me) the last one. This could become a bit problematic when the user indents these sections (very common to do), and some type of sanity checking is performed on the data. Consider the following
lines = user_dict['Molecule']['coords'].splitlines()
print(lines)
results in for the three examples
["He 0.0 0.0 0.0"]
["He 0.0 0.0 0.0", " "]
["He 0.0 0.0 0.0"]
The middle example has resulted in an empty list element. strip()
ing beforesplit()
ing fixes the issue, but parselglossy
should probably strip all extra whitespace under the hood.
I think predicates should be reported in the autogenerated documentation. It doesn't seem hard to implement, but I wanted to get your opinion on this.
It seems like it's not possible to omit sections entirely even if all their keywords have default values.
The following input.yml
foobar: false
validated with this template.yml
keywords:
- keyword: foobar
type: bool
documentation: |
foobar
sections:
- section: foo
keywords:
- keyword: bar
type: bool
default: true
documentation: |
foo.bar
gives the following error:
input_dict = validate_node(input_dict, template_dict)
File "parselglossy/validate.py", line 168, in validate_node
input_section = input_dict[section]
KeyError: 'foo'
There exists this neat library https://pydantic-docs.helpmanual.io that could help us with validation. At a glance, the advantage seems to be that defining new data types is equivalent to adding a class with decorated methods. This could certainly help in making parselglossy customizable and possibly to solve #51. The disadvantage is that this requires Python 3.6+
And keep the package version in one place only, in top level __init__.py
.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.