Hi,
I have used Pyleri to create a grammar describing text file format.
On small files (say, under a few hundred kilobytes) it is reasonably fast.
As soon as files get large enough, say 13MB - 195k lines, it takes about 2hrs to parse (Xeon 6132). This is prohibitively slow.
So, naturally, I've tried export_c function, which created grammar.c and grammar.h using pyleri.
From quick start in readme, I've created proof of concept - just load a file, parse and write is_valid result. It runs fine on small files (few 100s of kB). Using same above 13MB file, it is already running for 14h31m (edit: it took 17h15m30s), although not the same cpu but Xeon E5-2697v2.
When using Pyleri via pypy the parsing took 3hrs (on 2697v2).
This was very suprising for me, so thats why I'm creating issue here and not (yet) for pyleri.
I suspect something has quadratic or even worse complexity.
Maybe the problem comes from grammar itself somehow, I don't really get how or why.
I've tried to rewrite grammar to make it much smaller and simpler (still parses the same format in a more general way). It didn't help at all. Gain was about 31 seconds.
Running both grammars in python under cProfile wasnt very revealing.
Pretty please, is there any way out of this?
Thanks a lot!