cesbit / libcleri Goto Github PK

Libcleri is a powerful tool to build languages. From a built language, libcleri can automatically create parse trees, which are data structures representing how a grammar matches input. It also provides feedback in case the input does not match the language. This can be useful for auto-completion, suggestions or error handling.

License: MIT License

C 97.41% Makefile 1.81% Shell 0.78%

libcleri's People

Contributors

Stargazers

Watchers

Forkers

ezhangle pea22 refaqtor paulgevers neuroradiology anjabruls alenstarx kooiot libatoms joente

libcleri's Issues

Build using soname fails on Mac

Building using the current makefile fails on Mac.

CMakeLists support

I've been following this project with some interest.
Is there any interest for CMake support? I have prepared a CMakeLists file in my fork.
I intend to try to create a CMakeLists file for all siridb source code if there is interest for it.

Using libcleri as a "preprocessor"

Hello!

I would like to know if the following would be possible: In a project of mine, I'd like to extend the syntax of a used scripting language. Since I can not directly manipulate the parser of that language, I'd like to implement a preprocessor ontop of it. Can I use this library for this?

For simplicity, let's say I am using a JavaScript-like language and would like to add namespacing with a syntax like this:

namespace foo {
  function bar() {}
}

I would like to pick up the keyword namespace, the identifier foo, and the contents of the curly braces. Then, I'd like to re-shape that code into this:

__namespace("foo", function() {
  function bar() {
  }
});

Would this be possible?

Thank you in advance!

Parsing an empty string should not be valid for any grammar

When parsing an empty string, the parse result is always valid. This is true for some languages but when at least one element is required, an empty string should parse in_valid.

Remove SiriDB dependencies

cleri has some dependencies with SiriDB which should be removed.

cleri is very slow

Hi,
I have used Pyleri to create a grammar describing text file format.
On small files (say, under a few hundred kilobytes) it is reasonably fast.
As soon as files get large enough, say 13MB - 195k lines, it takes about 2hrs to parse (Xeon 6132). This is prohibitively slow.
So, naturally, I've tried export_c function, which created grammar.c and grammar.h using pyleri.
From quick start in readme, I've created proof of concept - just load a file, parse and write is_valid result. It runs fine on small files (few 100s of kB). Using same above 13MB file, it is already running for 14h31m (edit: it took 17h15m30s), although not the same cpu but Xeon E5-2697v2.
When using Pyleri via pypy the parsing took 3hrs (on 2697v2).
This was very suprising for me, so thats why I'm creating issue here and not (yet) for pyleri.

I suspect something has quadratic or even worse complexity.

Maybe the problem comes from grammar itself somehow, I don't really get how or why.
I've tried to rewrite grammar to make it much smaller and simpler (still parses the same format in a more general way). It didn't help at all. Gain was about 31 seconds.
Running both grammars in python under cProfile wasnt very revealing.

Pretty please, is there any way out of this?
Thanks a lot!

Remove cleri_children_t as this reduces some mallocs

The cleri_children_t is technically not required and when removed, this could reduce quite a few memory allocations when parsing text.

Instead of the cleri_children_t type, we should adjust the cleri_node_t.

Changes:

typedef struct {
    cleri_node_t * children;   // this used to be `cleri_children_t`; we still save the children here
    cleri_node_t * next;         // here will be the `next` child from the parent. (new on the node type)
...
} cleri_node_t;

Migration can be as follow:

If backwards compatibility is not required, then simply change all the cleri_children_t to cleri_node_t and remove all the ->node pointers.

If you do need to be backwards compatible, then add something like:

#if CLERI_VERSION_MAJOR >= 1

typedef struct cleri_node_s cleri_children_t;
#define cleri_gn(__child) (__child)

#else

#define cleri_gn(__child) (__child)->node

#endif

When using a pointer to a node, replace the ->node with cleri_gn(..) and leave all the cleri_children_t as-is.

Forward reference objects are not handled

This library should support forward reference elements but they are not correctly handles by libcleri.

Replace pcre with pcre2

libcleri is using pcre3 and should switch to the new pcre2 library.

Missing `end_of_statement` in expecting when parsing a list

When parsing a list and the list is broken after one or more items, the expecting list should for certain languages include the end_of_statement element. This seems to be missing. Besides missing this element, the position is also incorrect since it technically breaks after the last successful list element, but as a position the start of the list element is returned.