vlasovskikh / funcparserlib Goto Github PK

View Code? Open in Web Editor NEW

338.0 20.0 38.0 2.13 MB

Recursive descent parsing library for Python based on functional combinators

Home Page: https://funcparserlib.pirx.ru

License: MIT License

Makefile 0.42% Python 99.58%

parsing parser-combinators functional-programming python

funcparserlib's People

Contributors

Stargazers

Watchers

funcparserlib's Issues

Positions in lexer error ouput are 0-based instead of 1-based

Usually text positions are 1-based. A text starts with line 1, position 1.

The lexer from the current trunk uses 1-based lines, but 0-based positions. 
This fact creates confusion in error messages.

Original issue reported on code.google.com by andrey.vlasovskikh on 14 Mar 2010 at 10:11

Think about switching from exceptions to return values to indicate no parse errors

It could be something like `Maybe a` or `Either a Error` in Haskell. Maybe
analyzing return values will perform better than exceptions while keeping
the code clean.

Original issue reported on code.google.com by andrey.vlasovskikh on 6 Oct 2009 at 8:54

Write classic API docs

The documentation should be written in Sphinx and should look like all other 
Python docs.

Original issue reported on code.google.com by andrey.vlasovskikh on 26 May 2011 at 11:32

Write nicer guides: short and sweet

Current tutorials are too big and outdated.

To use as exampels: nested brackets, Lisp S-Exprs and JSON.

Format: rst.

Original issue reported on code.google.com by andrey.vlasovskikh on 26 May 2011 at 11:30

Bug in left recursion detection for JSON parser example

Tests for JSON parser fail with false left recursion `GrammarError`.

Original issue reported on code.google.com by andrey.vlasovskikh on 27 May 2011 at 8:19

Python's ** is right-associative, so should the tutorial calculator be.

What steps will reproduce the problem?
1. Try to evaluate "3**3**3" in the tutorial calculator parser in 
https://bitbucket.org/vlasovskikh/funcparserlib/src/16ed98522a11620d10f5c3a9d363
82e2e0931c59/doc/Tutorial.md?at=0.3.x
2. Try to evaluate "3**3**3" in Python.

What is the expected output? What do you see instead?
3**3**3 should be 3**(3**3) == 7625597484987, but when treated as 
left-associative, it is (3**3)**3 == 3**(3*3) == 3**9 == 19683

What version of the product are you using? On what operating system?
0.3.6

Please provide any additional information below.

The following code will make the ** operator right-associative.

    def eval_expr_r(lst, z):
        return reduce(lambda s, (x, f): f(x, s), reversed(lst), z)
    eval_r = unarg(eval_expr_r)

    factor = many(primary + pow) + primary >> eval_r

Or simply design the grammar like this:

    @with_forward_decls
    def factor():
        return (
            primary + pow + factor >> (lambda (x,f,y): f(x,y))
            | primary
            )

Original issue reported on code.google.com by [email protected] on 19 Aug 2013 at 4:55

make_tokenizer doesn't deal with binary tokenizers

tokenize = make_tokenizer([
    (u'x', (br'\xff\n',)),
])

tokens = list(tokenize(b"\xff\n"))

throws

  File "/Users/gsnedders/Documents/other-projects/funcparserlib/funcparserlib/funcparserlib/tests/test_parsing.py", line 76, in test_tokenize_bytes
    tokens = list(tokenize(b"\xff\n"))
  File "/Users/gsnedders/Documents/other-projects/funcparserlib/funcparserlib/funcparserlib/lexer.py", line 107, in f
    t = match_specs(compiled, str, i, (line, pos))
  File "/Users/gsnedders/Documents/other-projects/funcparserlib/funcparserlib/funcparserlib/lexer.py", line 91, in match_specs
    nls = value.count(u'\n')
UnicodeDecodeError: 'ascii' codec can't decode byte 0xff in position 0: ordinal not in range(128)

match_specs needs to handle unicode and bytes line feed characters.

Add primitive tok(type[, value]) instead of some()

`some` is potentially slower than `tok` and not acceptable for the grammar 
class analysis.

Original issue reported on code.google.com by andrey.vlasovskikh on 26 May 2011 at 10:49

Python 3 compatibility

What steps will reproduce the problem?
1. Try to use with Python 3

What is the expected output? What do you see instead?

Lots of syntax errors -- u'...' is now a syntax error, as is 'except Foo, e'.


OK, so I'm actually willing to do the work on this I think, but what I want to 
know is how concerned you are with earlier Python versions. For instance, it's 
possible to 'from __future__ import unicode_literals' and then change u'...' to 
'...', but only starting with Python 2.6 Are you concerned with supporting 
versions of Python earlier than that?

Original issue reported on code.google.com by [email protected] on 20 Nov 2012 at 4:29

In the Tutorial, don't use "composition" to refer to fmapping a function with a parser

From the tutorial:

See how composition works. We compose a parser some(...) of type Parser(Token, Token) with the function tokval and we get a value of type Parser again, but this time it is Parser(Token, str). Let's put it this way: the set of parsers is closed under the application of >> to a parser and a function of type a -> b.

As a functional programmer, this is confusing since this is an instance of fmapping, not function composition. Also, the tutorial also uses "composition" to refer to composing two parsers

We should be careful and compose parsers using | so that they don't conflict with each other:

To me, the first quote would make more sense if it read:

See how fmapping works. We fmap the function tokval with a parser some(...) of type Parser(Token, Token) and we get a value of type Parser again, but this time it is Parser(Token, str). Let's put it this way: the set of parsers is closed under the application of >> to a parser and a function of type a -> b.

Of course, if someone does not know about functors, this might be confusing.

Optimize lexer by joining all the spec regexps into a giant regexp

Maybe it will run faster.

Original issue reported on code.google.com by andrey.vlasovskikh on 26 May 2011 at 10:45

Write a very short howto as a counterpart to The funcparserlib Tutorial

The funcparserlib Tutorial is a large introduction to the library. One may
want to just start using things, not reading docs. So a much shorter howto
for such a person is needed.

Parsing and counting matched brackets could be used as an example.

Original issue reported on code.google.com by andrey.vlasovskikh on 23 Jul 2009 at 8:21

make_tokenizer tokenisers return generators, Parser expects sequences

from funcparserlib.lexer import make_tokenizer
from funcparserlib.parser import some

tokenize = make_tokenizer([
    (u'x', (ur'x',)),
])

some(lambda t: t.type == "x").parse(tokenize("x"))

results in

Traceback (most recent call last):
  File "/Users/gsnedders/Documents/other-projects/funcparserlib/funcparserlib/funcparserlib/tests/test_parsing.py", line 76, in test_tokenize
    some(lambda t: t.type == "x").parse(tokenize("x"))
  File "/Users/gsnedders/Documents/other-projects/funcparserlib/funcparserlib/funcparserlib/parser.py", line 121, in parse
    (tree, _) = self.run(tokens, State())
  File "/Users/gsnedders/Documents/other-projects/funcparserlib/funcparserlib/funcparserlib/parser.py", line 309, in _some
    if s.pos >= len(tokens):
TypeError: object of type 'generator' has no len()

tokenize("x") is a generator, and you can't call len on a generator.

Make parsing of LL(1) or LL(k) parts of the grammar faster

Possible techniques: predictive parsing, memoization, parallel computations 
(only linear performance improvements though).

Original issue reported on code.google.com by andrey.vlasovskikh on 26 May 2011 at 11:03

Blocked on: #22

Broken "Monadic Parsing in Python" blog link

The Nathan Sanders blog is gone. Found it in the Way Back Machine, though. Replace the link in your README?

https://web.archive.org/web/20120507001413/http://sandersn.com/blog/?tag=/monads

Could not get some information from exceptions

What steps will reproduce the problem?
1. Give invalid data to parsers
2. Catch exceptions from funcparserlib
3. Catched exception does not include parsing error information

What is the expected output? What do you see instead?

a) Instance of LexerError does not have exception message,
   I cannot get error message by str(e).

b) Instance of NoParserError does not have 'place' attribute,
   errored position was included only exception messages.
   So I cannot get errored position directory.

   (LexerError has 'place' attribute)

Original issue reported on code.google.com by i.tkomiya on 15 May 2011 at 6:10

Revisit the token spec

E. g. introduce a class for it in order to set options via named arguments
of the constructor.

Original issue reported on code.google.com by andrey.vlasovskikh on 6 Oct 2009 at 8:38

Rewrite tests from the examples as nosetests

nose it's faster than my scripts and it gives nice overall stats, etc.

Original issue reported on code.google.com by andrey.vlasovskikh on 26 May 2011 at 11:28

Parse errors should say what tokens are legal at the stopped position

Consider the code

from funcparserlib.parser import a
p = a('a') + (a('b') | a('c'))
p.parse("ad")

The parse failure produces a stack of exceptions that ends with "funcparserlib.parser.NoParseError: got unexpected token: d". This message shows what was unexpected, but not what was expected. A better message would be something like "got unexpected token d; expected b or c". Such messages could be built using the name attribute of parsers.

Clean up all the tok()-related stuff in funcparserlib.parser

The new API should be clean and convenient for builtin token types and should 
allow using custom token types.

Original issue reported on code.google.com by andrey.vlasovskikh on 26 May 2011 at 11:08

support py3

I tried to make funcparserlib compatible both python2 and 3.

https://code.google.com/r/aodagx-funcparserlib/source/detail?r=632313a710c3eb898
31478ae717dae5f3d576375&name=py3

please merge that.

Original issue reported on code.google.com by [email protected] on 20 Apr 2013 at 8:08

Write tests for LL(1) optimization

This optimization possibly has bugs (i. e. `_Alt` is incorrectly optimized). 
Tests are needed.What steps will reproduce the problem?
1.
2.
3.

What is the expected output? What do you see instead?


Please use labels and text to provide additional information.

Original issue reported on code.google.com by andrey.vlasovskikh on 26 May 2011 at 11:17

Blocking: #19

Make non-halting error messages easier to grasp

As for now, they're making people think hard.

Original issue reported on code.google.com by andrey.vlasovskikh on 26 May 2011 at 11:24

left_recursive() misses left recursion hidden behind many() and maybe()

Example:

    x = a('x')
    nonhalting = fwd()
    nonhalting.define(maybe(x) + nonhalting + x)
    assert non_halting(nonhalting)

Original issue reported on code.google.com by andrey.vlasovskikh on 26 May 2011 at 11:13

Parser similar to `maybe()`, but without a return value like `skip()`

I'd like to have a parser similar to maybe(), but one that does not return anything (like skip()) if no match is found.

Here is an example:

def optional(p):
    return p | (pure(None) >> _Ignored)

p = a('x') + optional(a('y')) + a('z')

print(p.parse('xyz')) # --> ('x', 'y', 'z')
print(p.parse('xz')) # --> ('x', 'z')

The issue here is that _Ignored is not in the public interface.

Is it possible to write optional() by using only the public interface of funcparserlib?

Some of the documenation files have a non-free license

The doc folder contains multiple files with the license "Creative Commons Attribution-Noncommercial-Share Alike 3.0" which makes those files non-free from a distribution point of perspective and therefore make shipping them with a Linux distribution harder.

Would it be possible to relicense those files or drop them altogether to avoid these licensing issues?

Make funcparserlib imports more compact

For example, instead of:

    from funcparserlib.lexer import make_tokenizer, Token
    from funcparserlib.parser import tok, many, fwd

allow this:

    from funcparserlib import make_tokenizer, Token, tok, many, fwd

Original issue reported on code.google.com by andrey.vlasovskikh on 26 May 2011 at 11:19

Add the changelog file

It is needed.

Original issue reported on code.google.com by andrey.vlasovskikh on 24 Jul 2009 at 7:49

make unittest fails in python2.7

What steps will reproduce the problem?
1." make unittest" fails.

$ LANG=C make test
/usr/bin/python -m unittest discover tests
FE
======================================================================
ERROR: test_many_backtracking (test_parsing.ParsingTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/kohei/devel/debpkg/funcparserlib/funcparserlib-0.3.5+hg~153/tests/test_parsing.py", line 15, in test_many_backtracking
    self.assertEqual(expr.parse(u'xyxyxx'),
  File "/usr/lib/python2.7/dist-packages/funcparserlib/parser.py", line 124, in parse
    raise NoParseError(u'%s: %s' % (e.msg, tok))
NoParseError: no tokens left in the stream: <EOF>

======================================================================
FAIL: test_error_info (test_parsing.ParsingTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/kohei/devel/debpkg/funcparserlib/funcparserlib-0.3.5+hg~153/tests/test_parsing.py", line 30, in test_error_info
    u'cannot tokenize data: 1,6: "f is \u0444"')
AssertionError: u'cannot tokenize data: 1,5: "f is \u0444"' != u'cannot 
tokenize data: 1,6: "f is \u0444"'
- cannot tokenize data: 1,5: "f is \u0444"
?                         ^
+ cannot tokenize data: 1,6: "f is \u0444"
?                         ^


----------------------------------------------------------------------
Ran 2 tests in 0.002s

FAILED (failures=1, errors=1)
make: *** [unittest] Error 1



What is the expected output? What do you see instead?


What version of the product are you using? On what operating system?

funcparserlib branch 0.3.x revision 153
Debian GNU/Linux Sid
Python 2.7.2+

Please provide any additional information below.

make examples are successful.


$ LANG=C make examples
/usr/bin/python -m unittest discover examples
......................
----------------------------------------------------------------------
Ran 22 tests in 0.026s

OK

Original issue reported on code.google.com by [email protected] on 5 Dec 2011 at 3:15

Doc links are unreachable

As of now, all links pointing to 
http://archlinux.folding-maps.org/2009/funcparserlib/ are unreachable - and 
that's most of docs. Can you please consider hosting all docs on the main 
project site?

Original issue reported on code.google.com by [email protected] on 13 Feb 2013 at 7:10

Create full predictive LL(k) parser generator for LL(k) grammars

LL(k) parsing is O(N) while LL(*) is O(2^N).

Original issue reported on code.google.com by andrey.vlasovskikh on 26 May 2011 at 10:53

Prevent passing a universally successful parser to the many combinator

Several people have reported that they wrote parsers that ran forever
entering an infinite loop. The problem is documented in FAQ [1].

Find some way to prevent passing this in runtime (by raising an
exception?), i. e. prevent passing a (standard?) universally successful
parser to the `many` combinator.

  [1]: http://archlinux.folding-maps.org/2009/funcparserlib/FAQ

Original issue reported on code.google.com by andrey.vlasovskikh on 6 Oct 2009 at 8:45

Show warning for non-LL(k) grammars

Try to write a LL(k) grammar detection function (a *non*-LL(k) detection 
function with counter-examples will be nicer to have).

A warning should be user-friendly and helpful for optimizing the grammar.

Original issue reported on code.google.com by andrey.vlasovskikh on 26 May 2011 at 10:59

Python 3 support

funcparserlib should support both Python 2 and 3.

Original issue reported on code.google.com by andrey.vlasovskikh on 22 Nov 2011 at 8:22

Tutorial not working

Hi!

I'm getting an error at line 47: return reduce(lambda s, (f, x): f(s, x), list, z)

It complains about the ( before f

Handle intrinsically circular grammars by parser-combinators decomposed to a set of modules.

In documentation you introduced an example of JSON parser, this format considered to be intrinsically circular on grammar level. So, what if my grammar has circular nature just like json`s one, but parsers spread to several modules, hence I have circular modules imports. Is there any built-in way to handle this?

Add a function for drawing textual parse trees

Such as:

/-- []
    |-- 0
    |   `-- 1
    |-- 1
    |   `-- None
    |-- 2
    |   `-- {}
    |       |-- a
    |       |   `-- True
    |       `-- c
    |           `-- []
    |-- 3
    |   `-- foo
    `-- 4
        `-- []
            |-- 0
            |   `-- 4
            `-- 1
                `-- 5

Original issue reported on code.google.com by andrey.vlasovskikh on 25 Jul 2009 at 11:27

setup.py install --prefix does not create directories and requires root privileges

Setup.py install

  * does not create necessary installation directories in prefix path
  * requires write access to system-wide /usr/lib/python2.5/site-packages
    even when installing to a user-writable prefix
  * is likely to interfere with package manager (due to writing to /usr/lib)

Prefix directory is user-writable (/usr/local/stow). Usually I install
manually built packages there and deploy them with GNU stow. It works for
most of the python packages I have built this was. It does not work for
funcparserlib-0.3.3

Installation log:

    funcparserlib-0.3.3$ python setup.py build
    ...
    funcparserlib-0.3.3$ python setup.py install
--prefix=/usr/local/stow/funcparserlib-0.3.3
    running install
    Checking .pth file support in
/usr/local/stow/funcparserlib-0.3.3/lib/python2.5/site-packages/
    error: can't create or remove files in install directory
    ... [ long explanations follow and suggest to create target directory ]

If I create
/usr/local/stow/funcparserlib-0.3.3/lib/python2.5/site-packages/ manually,
install terminates with an error reporting that prefix is not on PYTHONPATH.

If I define PYTHONPATH, install still terminates with:

    error: could not create 'build/bdist.linux-i686/egg': Permission denied

Then I install it like (I don't like to do it under sudo, but it seems it
requires write access to /usr/lib anyway):

    funcparserlib-0.3.3$ sudo
PYTHONPATH=/usr/local/stow/funcparserlib-0.3.3/lib/python2.5/site-packages/
python setup.py install --prefix=/usr/local/stow/funcparserlib-0.3.3
    running install
    ... [ skipping ]
    creating build/bdist.linux-i686/egg

I think `setup.py install --prefix=anyprefix` should not write anywhere
outside of `anyprefix`. I think it should create all necessary directories
automatically.

Debian/Lenny, Python 2.5.2, Setuptools 0.6c8-4

Original issue reported on code.google.com by s.astanin on 8 Sep 2009 at 12:09

tokenize not defined in funcparserlib.lexer

What steps will reproduce the problem?

>>> from funcparserlib.lexer import *
------------------------------------------------------------
Traceback (most recent call last):
  File "<ipython console>", line 1, in <module>
AttributeError: 'module' object has no attribute 'tokenize'

`tokenize` is commented in `lexer.py` but it is still present in its `__all__`.

Funcparserlib 0.3.3

Original issue reported on code.google.com by s.astanin on 9 Sep 2009 at 7:37

Make a new release

Over in hylang/hy#2026, we're hitting a bug that was fixed 6 years ago, because the last release of funcparserlib on PyPI is nearly 8 years old.

RPython compatible

I'm not sure how difficult this would be
but it would be nice to be able to use 
funcparserlib along with RPython.

Original issue reported on code.google.com by [email protected] on 25 Jul 2013 at 12:25

many() consumes too many tokens in some cases

from funcparserlib.parser import a, many
A = a('A')
B = a('B')
x = many(A + B) + A + A
print x.parse('ABABAA')

This raises a NoParseError("no tokens left in stream") despite being passed a 
valid string.

Looking into it a bit I found that the many() is consuming the penultimate A in 
the string even though it wasn't matched. This is because many returns the 
state from the exception raised when (A + B) failed to parse "AA" - however the 
first "A" has  been consumed at this point as it was a valid first token for 
the failed parse.

Returning the state after the last successful parse fixed it for me:

diff -r 82b1066c6c18 src/funcparserlib/parser.py
--- a/src/funcparserlib/parser.py   Sat Aug 06 15:03:56 2011 +0400
+++ b/src/funcparserlib/parser.py   Thu Nov 17 13:01:21 2011 +0000
@@ -323,7 +323,7 @@
                 v, s = self.p(tokens, s)
                 res.append(v)
         except _NoParseError, e:
-            return res, e.state
+            return res, s

     def ebnf(self):
         return u'{ %s }' % self.p

Original issue reported on code.google.com by [email protected] on 17 Nov 2011 at 1:06

Create a library of typical parsers

It would be nice to have a library of typical parsers, such as a parser of
int or float literals, escaped strings etc.

Typical tokenizer specs could be also useful.

Original issue reported on code.google.com by andrey.vlasovskikh on 6 Oct 2009 at 9:03

Blocked on: #7

README mentions `setuptools` but setup.py doesn't require it

What steps will reproduce the problem?
1. read README, setup.py

Original issue reported on code.google.com by [email protected] on 10 Oct 2009 at 7:23

Not followed by

I've ended up with something like:

header = some(lambda tok: tok.type == "HEADER")
data = some(lambda tok: tok.type == "DATA")
empty_line = some(lambda tok: tok.type == "EMPTY")

body = many(data | empty_line)

segment = header + body
segments = segment + many(skip(empty_line) + segment)

This ends up with an unexpected token error for the second HEADER token with a token stream like HEADER BODY EMPTY HEADER BODY as the EMPTY gets consumed by body and hence it cannot be consumed by segments.

In Haskell I'd solve this with something like body = data <|> (try ( do { empty_line ; notFollowedBy header } )). As far as I can tell, there's nothing comparable to try or notFollowedBy. Is there any sensible way to define such a grammar?

Problem Combining Operator | and Operator +

Consider the following simple and fictive example:

from funcparserlib.parser import many, some, finished

code = "aa"

p1 = some(lambda x: x == 'a')
p2 = many(p1)

p_or = p1 | p2

parsed_tokens = (p3+finished).parse("aa") 
# Error, since it would short-circuit to p1!

I encounter the case when writing a toy parser for a modeling language.

I'm not really familiar with the parsing theory, so I was wondering if this is the intended behavior for a parser combinator?

If it is not, how may I fix the __or__ method of the Parser class (maybe even more)? I thought about propagating the exceptions back to the __or__ method, or maybe you would have a better idea?

[deleted issue]

[deleted issue]

Create a raw parse tree pretty-printer

One may want to just see his parse tree before writing AST-building
"semantic" functions (for `>>` a.k.a. `fmap`).

A parse tree is a list structure (in the Lisp sense) of tokens.

Original issue reported on code.google.com by andrey.vlasovskikh on 6 Oct 2009 at 9:01

Convert documenation into Sphinx format

Sphinx is nice.

Original issue reported on code.google.com by andrey.vlasovskikh on 26 May 2011 at 11:22

Merged into: #29

funcparserlib raise syntax error in python 2.4

What steps will reproduce the problem?
1. Run program using funcparserlib-0.3.4 under python 2.4

What is the expected output? What do you see instead?

raises SyntaxError in funcparserlib/util.py (line 38)

What version of the product are you using? On what operating system?

funcparserlib-0.3.4
python 2.4

Please provide any additional information below.

I made patch for this problem.
This error is caused by ternary operator (It's supported from python2.4)

Original issue reported on code.google.com by i.tkomiya on 8 Jan 2011 at 3:13

Attachments:

funcparserlib_python2.4.patch

vlasovskikh / funcparserlib Goto Github PK

funcparserlib's People

Contributors

Stargazers

Watchers

Forkers

funcparserlib's Issues

Recommend Projects

Recommend Topics

Recommend Org