dev-cafe / parselglossy Goto Github PK

View Code? Open in Web Editor NEW

7.0 7.0 1.0 6 MB

Generic input parsing library, speaking in tongues.

Home Page: https://parselglossy.readthedocs.io

License: MIT License

Python 99.32% Nix 0.68%

input-parsing python3

parselglossy's People

Contributors

Stargazers

Watchers

Forkers

stigrj

parselglossy's Issues

Chain of dependencies

Description

The following chain of dependencies currently does not work if neither bar nor baz is set:

keywords:
  - name: foo                                                             
    type: float                                                                  
    docstring: "Foo"
  - name: bar
    type: float
    default: "user['foo']"
    docstring: "Bar"
  - name: baz
    type: float
    default: "user['bar']"
    docstring: "Baz"

It seems baz is first set equal to the string "user['foo']" (which is a type mismatch), before bar is set to the value of foo. Related to issue #73.

--doctest-module flag is not recognized

Fails PRs on Travis and my laptop:
https://github.com/dev-cafe/parselglossy/blob/master/.travis.yml#L19

I think it's supposed to be --doctest-modules.

Deprecation mechanism for sections or keywords

Some programs allow different names to be used for the same keywod/section. For example, CONVTHR might be just as valid as CONVERGENCE_THRESHOLD (or something similar). Do we want to allow for this, and if yes how would it look? An alias field accepting a list of aliases?

Consider adding Union type

I was thinking about cases where a given keyword might be either a list or just a scalar, that is Union[int, List[int]] (or similar) Do you think this is a good idea? I don't know exactly how it would look in the implementation.

Multiple definitions of keywords

parselglossy version: 0.3.0

Description

Currently the following is allowed

Foo {
  bar = true
}

Foo {
  bar = false
}

and the last definition takes precedence. This should probably not be allowed.

Static typing with mypy

Autogenerate list of authors in docs from AUTHORS.rst

We have:

__init__.py defines various fields like __author__ and __maintainers__ and __credits__ which are not widely used, I think.
AUTHORS.rst, which is more widely recognized and could be used more formally for authorship recognition.
the author variable in docs/conf.py which Sphinx uses.

It would be good to consolidate these, so we only need to update in one place. I personally favor using AUTHORS.rst exclusively, only put __version__ and __all__ in __init__.py, and do some regex magic in conf.py to update the author variable.

Refactor tests

Generation of test cases can get a bit unwieldy, given how many different "axis" there are. Hypothesis can help quite a bit and we should refactor other tests to use it.

Input parsing is silent on some type errors

parselglossy version: 0.7.0
Python version: 3.8.5
Operating System: ubuntu20.04

Description

The following type error in the input file (getkw grammar)

some_int = a

correctly spits out error message

Actual (str) and declared (int) types do not match.

but if the leading character is a number

some_int = 1a

then there is a silent exit, and all input parameters defined below this point in the input file will not be parsed and thus get their default values. This means that the user will not get notified unless some of those lower keywords don't have defaults.

Prepare tutorials

We need to prepare tutorials on how to use the library. We will need different tutorials ranging from basic to advanced uses:
Basic/getting started:

Using the library in a Python code. This will show how to use actual input files and plain dict-s.
How to go from installing the library, adopting one of the available grammars, and hook it up to a program in a compiled language. The ancillary repository for C++, C, and Fortran should be ready for this purpose.

Intermediate:

Composing inputs with different grammars.

Advanced:

Defining new grammars. This is needed, but I don't want to write a pyparsing tutorial.

Arbitrary callbacks on the input tree

Should we add the possibility to perform arbitrary operations on the input tree? At the moment we have callables for defaulting and predicate checking. The former can modify the input tree, but only to fill default values, the latter are only allowed to return booleans.
@stigrj has a really good use case of modifying an entry in the dictionary to a value with an entirely different type (reading in the molecule from a multiline string to a dictionary of lists)
We could add an actions field to keywords a hook to run arbitrary operations after predicate checking. This sort of throws out of the window the whole type checking thing, but gives a lot of flexibility to the users of the library. I can see the argument against: parselglossy parses and gives you an input tree, what you do afterwards is up to you and if you screw it up, it's on you.

Tagging of sections does not play well with validation template

The following is parsed by the grammar, but does not pass validation.

name<foo> {
   ... keywords ...
  .... sections ...
}

This totally slipped through the cracks and I am unsure how to solve it.

CLI with minimal/none defaults and writing output to stdout

The CLI would require all required input files to be provided and would assume no defaults.

Motivation:

explicit is better than implicit
less surprising
more flexibility for users to name/place files where they like
when writing output to stdout, we do not need to check whether we overwrite files
simpler code

Would require a bit more typing but this will be less of a problem for users of the lib than it is now for the developers during debugging.

Consolidate specification of dependencies

We have dependencies for development (specified in Pipfile), testing (installed with pip on CI services, see the YAML files), and deployment (installed with pip install parselglossy).
This is problematic as we can easily lose track of dependencies in this way.

Improve design documentation

Get default value from other keyword

Description

It would be great to have the possibility to use keyword values as defaults for other related keywords. Something like this in the predicate syntax:

keywords:
  - keyword: precision
    type: float
    default: 1.0e-6
  - keyword: threshold
    type: float
    default: "value = 10 * input_dict['precision']"

Add a Path type

In the discussion over #51 it emerged that filesystem paths should be allowed to have richer semantics than that allowed by simple strings.

Initial Update

The bot created this issue to inform you that pyup.io has been set up on this repo.
Once you have closed it, the bot will open pull requests for updates as soon as they are available.

Documentation field name: "docstring" vs "documentation" vs something else

With #42 I renamed documentation to docstring. @bast commented: "would someone who has never heard of docstrings know what this is? Is documentation more intuitive? Also docstrings are really for documentation extracted out of source code and I am not sure whether the input blueprint counts as source code."
The only reason why I did the rename is that I consistently misspell "documentation" when coding and I think it's a bit too long of a name for a field.
Let us discuss and agree on something.

Document how to use grammar specifications different from packaged ones

I think the grammar definitions should be part of the tutorials, not part of the core library.

GitHub Action producin PyPI package does not trigger on release

I prepared and published v0.5.0, but the action was not triggered and thus no PyPI package uploaded.

Not sure how to run the script/generator

This is what I tried:

$ poetry install -v
$ poetry run parselglossy --help

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/usr/lib/python3.8/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
  File "<frozen importlib._bootstrap>", line 991, in _find_and_load
  File "<frozen importlib._bootstrap>", line 961, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
  File "<frozen importlib._bootstrap>", line 991, in _find_and_load
  File "<frozen importlib._bootstrap>", line 973, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'parseglossy'

Then (not knowing poetry well enough) I created a new folder, new venv and pip installed from the source folder, then:

$ parselglossy --help

Traceback (most recent call last):
  File "/home/user/tmp/foo/venv/bin/parselglossy", line 5, in <module>
    from parseglossy.cli import cli
ModuleNotFoundError: No module named 'parseglossy'

Validation of empty lists

type_matches([], 'List[T]') is true for any underlying type T. Should type_matches throw on empty lists?

Strings in getkw grammar

parselglossy version: 25e4337
Python version: 3.7.0
Operating System: Ubuntu-18.10

Description

In good old getkw strings are accepted both with and without quotes

  some_string = foo
  another_string = 'bar'
  yet_another_string = "baz"

but the getkw grammar in parselglossy fails to accept quoteless strings if they are part of a section:

some_string = foo          # works
another_string = 'bar'     # works
yet_another_string = "baz" # works

some_section {
  some_string = foo        # fails
  another_string = 'bar'   # works
  yet_another_string "baz" # works
}

Finesse CLI and setuptools integration

I think main can be removed. At least it's the impression I get from Click's docs: http://click.palletsprojects.com/en/7.x/setuptools/#setuptools-integration

Tox and flake8

So first of all I am not sure why we use tox. For me this is just one more layer. I would use pytest directly. One motivation may be that it runs pytest and flake8 in one step.

Speaking of which I get many flake8 warnings when running locally and a non-zero exit but I don't see these on Travis which confuses me.

Reactivate macOS builds

We took them out because they stalled for no apparent reason in #84.

Restore Python 3.5 support

#45 drops Python 3.5 support. The lack of dictionary ordering makes testing of API and CLI quite painful. Support could be restored by sorting keys when serializing JSON and/or by consistently using OrderedDict. We need to asses how important it is to support 3.5

Create an example with an input template and input validation

[Pasting here notes from our meeting last week]

While the tool will allow "any" grammar, try to express the input structure as YAML.
Keywords require to be documented.
Demonstrate a cross-section validation.
In validation we can use predicates which allow "any" valid Python.
Keywords either have defaults or are required.

read_yaml_file interprets complex numbers as strings

This should be easy to fix with a try/except cast but I postponed this. This is the reason why currently test_validation reference contains the value as string.

JSON and floating point precision

I was not aware of this, but apparently reading from/writing to JSON can mess up precision of floating point numbers. I don't really know how to test and prevent this...

ReadTheDocs is not building API docs

When I click on "module index" I land on a nonexisting page: https://parselglossy.readthedocs.io/en/latest/py-modindex.html

Add page linking to codes using parselglossy

Should it be a section in the README or in the documentation proper or both?

Make sure keywords have no sections

Check that the keywords stanza has no sections in the template.

Separate development requirements

The development packages in Pipfile are in two classes:

the stuff that everyone should use to develop parselglossy, and
the stuff that only I use to develop parselglossy (like the language server).

Since it's unfair to inflict upon the world the latter, I need to figure out a way to keep the two things separate.

Validation failures

Follow-up from #20
It would be nice to have the validator not fail immediately, but rather keep going and prepare a report of all failures. The user might then get a chance to fix all failures at once, rather than going back and forth (fix one, rerun, fix next, rerun, etc)

Drop support for Python 3.4?

I would :-)

Python 3.4 reached end of life yesterday:
https://devguide.python.org/#status-of-python-branches

JOSS paper

Let's submit a paper to JOSS! I have a branch paper where I wrote some notes: https://github.com/dev-cafe/parselglossy/blob/paper/paper/paper.md

Fix the PyPI deploy step

Circular dependencies in default and predicate callables?

Can we have circular dependencies in default and predicate callables? If not, we need to document why. If yes, we need a way to stop parsing with a meaningful error message.

Create DALTON-like grammar for input parsing

Much as in #8: to show off the generality of our approach and be a drop-in replacement for the DALTON input parser.

Repeated keywords in the autogenerated documentation

parselglossy version: b28084a
Python version: 3.7.0
Operating System: Ubuntu-18.10

What I Did

The following template.yml

keywords:
  - name: foo
    type: int
    docstring: foo
  - name: bar
    type: int
    docstring: bar
  - name: baz
    type: int
    docstring: baz

is documented using the CLI command parselglossy doc template.yml to give:

**Keywords**
 :foo: foo

  **Type** ``int``

**Keywords**
 :foo: foo

  **Type** ``int``

 :bar: bar

  **Type** ``int``

**Keywords**
 :foo: foo

  **Type** ``int``

 :bar: bar

  **Type** ``int``

 :baz: baz

  **Type** ``int``

Case-insensitive parsing and validation.

As pointed out by @stigrj it would be good to not force case-sensitivity upon the users. I see some options:

Do it before we parse. This would mean the programmer using the library has to normalize the case of the input file and the validation template has to adhere to this choice. That is, if normalization is uppercase (lowercase), then validation template use uppercase (lowercase).
As point 1, but we add an option to the api.lex (and expose it in the CLI). Something like --case upper (--case lower)
We let the grammar take care of case normalization, by adding a parse action to the various tokens. In case we change the parsing library (from pyparsing to lark, for example) this will not carry through.
We do case normalization at the validation level. I think this is the most invasive option of all.

Whitespace and newlines in coordinate sections between content and $end is not stripped

parselglossy version: 0.7.0
Python version: 3.9.2
Operating System: MacOS
Context: MRChem

Description

Upon parsing coordinate sections such as those used when specifying atomic coordinates

$coords
He 0.0 0.0 0.0
$end

or solvation cavity spheres

$spheres
0.0 0.0 0.0 4.0
$end

the parser ignores all whitespace and newlines between $start and the actual content, but does not ignore whitespace and newlines between the content and $end. As a result, the following sections are not parsed identically:

$coords
He 0.0 0.0 0.0
$end

$coords
He 0.0 0.0 0.0
               $end

$coords
He 0.0 0.0 0.0$end

These result in the following strings, respectively

"He 0.0 0.0 0.0\n"
"He 0.0 0.0 0.0\n               "
"He 0.0 0.0 0.0"

The expected output for all is (at least to me) the last one. This could become a bit problematic when the user indents these sections (very common to do), and some type of sanity checking is performed on the data. Consider the following

lines = user_dict['Molecule']['coords'].splitlines()
print(lines)

results in for the three examples

["He 0.0 0.0 0.0"]
["He 0.0 0.0 0.0", "              "]
["He 0.0 0.0 0.0"]

The middle example has resulted in an empty list element. strip()ing beforesplit()ing fixes the issue, but parselglossy should probably strip all extra whitespace under the hood.

Documentation should report predicates too.

I think predicates should be reported in the autogenerated documentation. It doesn't seem hard to implement, but I wanted to get your opinion on this.

Omitting sections with all defaults

parselglossy version: b51e249
Python version: 3.7.0
Operating System: Ubuntu-18.10

Description

It seems like it's not possible to omit sections entirely even if all their keywords have default values.

What I Did

The following input.yml

foobar: false

validated with this template.yml

keywords:
  - keyword: foobar
    type: bool
    documentation: |
      foobar
sections:
  - section: foo
    keywords:
      - keyword: bar
        type: bool
        default: true
        documentation: |
          foo.bar

gives the following error:

    input_dict = validate_node(input_dict, template_dict)
  File "parselglossy/validate.py", line 168, in validate_node
    input_section = input_dict[section]
KeyError: 'foo'

pydantic for validation

There exists this neat library https://pydantic-docs.helpmanual.io that could help us with validation. At a glance, the advantage seems to be that defining new data types is equivalent to adding a class with decorated methods. This could certainly help in making parselglossy customizable and possibly to solve #51. The disadvantage is that this requires Python 3.6+

Try to go without bumpversion

And keep the package version in one place only, in top level __init__.py.

dev-cafe / parselglossy Goto Github PK

parselglossy's People

Contributors

Stargazers

Watchers

Forkers

parselglossy's Issues

Description

Description

Description

Description

Description

What I Did

Description

Description

What I Did

Recommend Projects

Recommend Topics

Recommend Org