Giter VIP home page Giter VIP logo

parselglossy's People

Contributors

bast avatar robertodr avatar stigrj avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

Forkers

stigrj

parselglossy's Issues

Chain of dependencies

Description

The following chain of dependencies currently does not work if neither bar nor baz is set:

keywords:
  - name: foo                                                             
    type: float                                                                  
    docstring: "Foo"
  - name: bar
    type: float
    default: "user['foo']"
    docstring: "Bar"
  - name: baz
    type: float
    default: "user['bar']"
    docstring: "Baz"

It seems baz is first set equal to the string "user['foo']" (which is a type mismatch), before bar is set to the value of foo. Related to issue #73.

Deprecation mechanism for sections or keywords

Some programs allow different names to be used for the same keywod/section. For example, CONVTHR might be just as valid as CONVERGENCE_THRESHOLD (or something similar). Do we want to allow for this, and if yes how would it look? An alias field accepting a list of aliases?

Consider adding Union type

I was thinking about cases where a given keyword might be either a list or just a scalar, that is Union[int, List[int]] (or similar) Do you think this is a good idea? I don't know exactly how it would look in the implementation.

Multiple definitions of keywords

  • parselglossy version: 0.3.0

Description

Currently the following is allowed

Foo {
  bar = true
}

Foo {
  bar = false
}

and the last definition takes precedence. This should probably not be allowed.

Autogenerate list of authors in docs from AUTHORS.rst

We have:

  • __init__.py defines various fields like __author__ and __maintainers__ and __credits__ which are not widely used, I think.
  • AUTHORS.rst, which is more widely recognized and could be used more formally for authorship recognition.
  • the author variable in docs/conf.py which Sphinx uses.

It would be good to consolidate these, so we only need to update in one place. I personally favor using AUTHORS.rst exclusively, only put __version__ and __all__ in __init__.py, and do some regex magic in conf.py to update the author variable.

Refactor tests

Generation of test cases can get a bit unwieldy, given how many different "axis" there are. Hypothesis can help quite a bit and we should refactor other tests to use it.

Input parsing is silent on some type errors

  • parselglossy version: 0.7.0
  • Python version: 3.8.5
  • Operating System: ubuntu20.04

Description

The following type error in the input file (getkw grammar)

some_int = a

correctly spits out error message

Actual (str) and declared (int) types do not match.

but if the leading character is a number

some_int = 1a

then there is a silent exit, and all input parameters defined below this point in the input file will not be parsed and thus get their default values. This means that the user will not get notified unless some of those lower keywords don't have defaults.

Prepare tutorials

We need to prepare tutorials on how to use the library. We will need different tutorials ranging from basic to advanced uses:
Basic/getting started:

  • Using the library in a Python code. This will show how to use actual input files and plain dict-s.
  • How to go from installing the library, adopting one of the available grammars, and hook it up to a program in a compiled language. The ancillary repository for C++, C, and Fortran should be ready for this purpose.

Intermediate:

  • Composing inputs with different grammars.

Advanced:

  • Defining new grammars. This is needed, but I don't want to write a pyparsing tutorial.

Arbitrary callbacks on the input tree

Should we add the possibility to perform arbitrary operations on the input tree? At the moment we have callables for defaulting and predicate checking. The former can modify the input tree, but only to fill default values, the latter are only allowed to return booleans.
@stigrj has a really good use case of modifying an entry in the dictionary to a value with an entirely different type (reading in the molecule from a multiline string to a dictionary of lists)
We could add an actions field to keywords a hook to run arbitrary operations after predicate checking. This sort of throws out of the window the whole type checking thing, but gives a lot of flexibility to the users of the library. I can see the argument against: parselglossy parses and gives you an input tree, what you do afterwards is up to you and if you screw it up, it's on you.

CLI with minimal/none defaults and writing output to stdout

The CLI would require all required input files to be provided and would assume no defaults.

Motivation:

  • explicit is better than implicit
  • less surprising
  • more flexibility for users to name/place files where they like
  • when writing output to stdout, we do not need to check whether we overwrite files
  • simpler code

Would require a bit more typing but this will be less of a problem for users of the lib than it is now for the developers during debugging.

Consolidate specification of dependencies

We have dependencies for development (specified in Pipfile), testing (installed with pip on CI services, see the YAML files), and deployment (installed with pip install parselglossy).
This is problematic as we can easily lose track of dependencies in this way.

Get default value from other keyword

Description

It would be great to have the possibility to use keyword values as defaults for other related keywords. Something like this in the predicate syntax:

keywords:
  - keyword: precision
    type: float
    default: 1.0e-6
  - keyword: threshold
    type: float
    default: "value = 10 * input_dict['precision']"

Add a Path type

In the discussion over #51 it emerged that filesystem paths should be allowed to have richer semantics than that allowed by simple strings.

Initial Update

The bot created this issue to inform you that pyup.io has been set up on this repo.
Once you have closed it, the bot will open pull requests for updates as soon as they are available.

Documentation field name: "docstring" vs "documentation" vs something else

With #42 I renamed documentation to docstring. @bast commented: "would someone who has never heard of docstrings know what this is? Is documentation more intuitive? Also docstrings are really for documentation extracted out of source code and I am not sure whether the input blueprint counts as source code."
The only reason why I did the rename is that I consistently misspell "documentation" when coding and I think it's a bit too long of a name for a field.
Let us discuss and agree on something.

Not sure how to run the script/generator

This is what I tried:

$ poetry install -v
$ poetry run parselglossy --help

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/usr/lib/python3.8/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
  File "<frozen importlib._bootstrap>", line 991, in _find_and_load
  File "<frozen importlib._bootstrap>", line 961, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
  File "<frozen importlib._bootstrap>", line 991, in _find_and_load
  File "<frozen importlib._bootstrap>", line 973, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'parseglossy'

Then (not knowing poetry well enough) I created a new folder, new venv and pip installed from the source folder, then:

$ parselglossy --help

Traceback (most recent call last):
  File "/home/user/tmp/foo/venv/bin/parselglossy", line 5, in <module>
    from parseglossy.cli import cli
ModuleNotFoundError: No module named 'parseglossy'

Validation of empty lists

type_matches([], 'List[T]') is true for any underlying type T. Should type_matches throw on empty lists?

Strings in getkw grammar

  • parselglossy version: 25e4337
  • Python version: 3.7.0
  • Operating System: Ubuntu-18.10

Description

In good old getkw strings are accepted both with and without quotes

  some_string = foo
  another_string = 'bar'
  yet_another_string = "baz"

but the getkw grammar in parselglossy fails to accept quoteless strings if they are part of a section:

some_string = foo          # works
another_string = 'bar'     # works
yet_another_string = "baz" # works

some_section {
  some_string = foo        # fails
  another_string = 'bar'   # works
  yet_another_string "baz" # works
}

Tox and flake8

So first of all I am not sure why we use tox. For me this is just one more layer. I would use pytest directly. One motivation may be that it runs pytest and flake8 in one step.

Speaking of which I get many flake8 warnings when running locally and a non-zero exit but I don't see these on Travis which confuses me.

Restore Python 3.5 support

#45 drops Python 3.5 support. The lack of dictionary ordering makes testing of API and CLI quite painful. Support could be restored by sorting keys when serializing JSON and/or by consistently using OrderedDict. We need to asses how important it is to support 3.5

Create an example with an input template and input validation

[Pasting here notes from our meeting last week]

  • While the tool will allow "any" grammar, try to express the input structure as YAML.
  • Keywords require to be documented.
  • Demonstrate a cross-section validation.
  • In validation we can use predicates which allow "any" valid Python.
  • Keywords either have defaults or are required.

JSON and floating point precision

I was not aware of this, but apparently reading from/writing to JSON can mess up precision of floating point numbers. I don't really know how to test and prevent this...

Separate development requirements

The development packages in Pipfile are in two classes:

  1. the stuff that everyone should use to develop parselglossy, and
  2. the stuff that only I use to develop parselglossy (like the language server).

Since it's unfair to inflict upon the world the latter, I need to figure out a way to keep the two things separate.

Validation failures

Follow-up from #20
It would be nice to have the validator not fail immediately, but rather keep going and prepare a report of all failures. The user might then get a chance to fix all failures at once, rather than going back and forth (fix one, rerun, fix next, rerun, etc)

Repeated keywords in the autogenerated documentation

  • parselglossy version: b28084a
  • Python version: 3.7.0
  • Operating System: Ubuntu-18.10

What I Did

The following template.yml

keywords:
  - name: foo
    type: int
    docstring: foo
  - name: bar
    type: int
    docstring: bar
  - name: baz
    type: int
    docstring: baz

is documented using the CLI command parselglossy doc template.yml to give:

**Keywords**
 :foo: foo

  **Type** ``int``

**Keywords**
 :foo: foo

  **Type** ``int``

 :bar: bar

  **Type** ``int``

**Keywords**
 :foo: foo

  **Type** ``int``

 :bar: bar

  **Type** ``int``

 :baz: baz

  **Type** ``int``

Case-insensitive parsing and validation.

As pointed out by @stigrj it would be good to not force case-sensitivity upon the users. I see some options:

  1. Do it before we parse. This would mean the programmer using the library has to normalize the case of the input file and the validation template has to adhere to this choice. That is, if normalization is uppercase (lowercase), then validation template use uppercase (lowercase).
  2. As point 1, but we add an option to the api.lex (and expose it in the CLI). Something like --case upper (--case lower)
  3. We let the grammar take care of case normalization, by adding a parse action to the various tokens. In case we change the parsing library (from pyparsing to lark, for example) this will not carry through.
  4. We do case normalization at the validation level. I think this is the most invasive option of all.

Whitespace and newlines in coordinate sections between content and $end is not stripped

  • parselglossy version: 0.7.0
  • Python version: 3.9.2
  • Operating System: MacOS
  • Context: MRChem

Description

Upon parsing coordinate sections such as those used when specifying atomic coordinates

$coords
He 0.0 0.0 0.0
$end

or solvation cavity spheres

$spheres
0.0 0.0 0.0 4.0
$end

the parser ignores all whitespace and newlines between $start and the actual content, but does not ignore whitespace and newlines between the content and $end. As a result, the following sections are not parsed identically:

$coords
He 0.0 0.0 0.0
$end

$coords
He 0.0 0.0 0.0
               $end

$coords
He 0.0 0.0 0.0$end

These result in the following strings, respectively

"He 0.0 0.0 0.0\n"
"He 0.0 0.0 0.0\n               "
"He 0.0 0.0 0.0"

The expected output for all is (at least to me) the last one. This could become a bit problematic when the user indents these sections (very common to do), and some type of sanity checking is performed on the data. Consider the following

lines = user_dict['Molecule']['coords'].splitlines()
print(lines)

results in for the three examples

["He 0.0 0.0 0.0"]
["He 0.0 0.0 0.0", "              "]
["He 0.0 0.0 0.0"]

The middle example has resulted in an empty list element. strip()ing beforesplit()ing fixes the issue, but parselglossy should probably strip all extra whitespace under the hood.

Omitting sections with all defaults

  • parselglossy version: b51e249
  • Python version: 3.7.0
  • Operating System: Ubuntu-18.10

Description

It seems like it's not possible to omit sections entirely even if all their keywords have default values.

What I Did

The following input.yml

foobar: false

validated with this template.yml

keywords:
  - keyword: foobar
    type: bool
    documentation: |
      foobar
sections:
  - section: foo
    keywords:
      - keyword: bar
        type: bool
        default: true
        documentation: |
          foo.bar

gives the following error:

    input_dict = validate_node(input_dict, template_dict)
  File "parselglossy/validate.py", line 168, in validate_node
    input_section = input_dict[section]
KeyError: 'foo'

pydantic for validation

There exists this neat library https://pydantic-docs.helpmanual.io that could help us with validation. At a glance, the advantage seems to be that defining new data types is equivalent to adding a class with decorated methods. This could certainly help in making parselglossy customizable and possibly to solve #51. The disadvantage is that this requires Python 3.6+

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.