gallais / idris-tparsec Goto Github PK

View Code? Open in Web Editor NEW

93.0 8.0 10.0 250 KB

TParsec - Total Parser Combinators in Idris

Home Page: https://gallais.github.io/idris-tparsec/

License: GNU General Public License v3.0

Idris 100.00%

idris parser-combinators

idris-tparsec's Introduction

TParsec - Total Parser Combinators in Idris

This is a port of agdarsec to Idris.

idris-tparsec's People

Contributors

Stargazers

Watchers

Forkers

clayrat andrevidela marcosh shriphani stjordanis iomeone eayus timsueberkrueb glmxndr dunhamsteve

idris-tparsec's Issues

Add Examples as tests to Travis build

Simple generic lexer

I have written a small generic lexer as doc for the agda standard library's Trie.
I think it would be nice to have a (better, position-aware) version of this for
the total parser libraries. I'm planning to port it to agdarsec.

https://github.com/agda/agda-stdlib/blob/91f9c5ece802554a87f2106daa97902fefd17218/README/Trie.agda

Port to Idris2

I've pushed an initial attempt to the idris2 branch, the core library itself seems to compile (at least with the latest master version), but the examples still need some work.

Record spans instead of just starting positions

In addition to knowing the starting position of a parsed String, it would be very useful to know the ending position, so I would like to suggest generalising Position to Span.

Refactor RegExp example to use new lexer

Design patterns

It might be interesting to go through this paper to see if any of their patterns are applicable:

https://dl.acm.org/doi/pdf/10.1145/3471874.3472984 Willis, Wu, [2021] "Design Patterns for Parser Combinators"

https://github.com/j-mie6/design-patterns-for-parser-combinators

Boxed combinators?

Using combinators on rec appears to be fairly common, and it seems to always require defining helpers to make types align, e.g.:

andbox : All (Box (Parser' s) :-> Box (Parser' t) :-> Box (Parser' (s, t)))
andbox {s} {t} p q = map2 {a=Parser' s} {b=Parser' t} (\p, q => and p q) p q

nelistbox : All (Box (Parser' s) :-> Box (Parser' (NEList s)))
nelistbox {s} p = map {a=Parser' s} nelist p

Should we maybe provide a separate module for these out of the box :), or there's a more elegant solution?

Add docs to SLTC example a la Arithmetic example

Background

I am very excited about this library. A couple years ago I was all into parser combinators and tried using Lightyear. It made all my functions partial and my program crashed and I had no idea why. I searched around until I found the Agda paper (which I guess you wrote) but was out of energy and gave up. It was beyond me at the time; I was only beginning to put effort toward reading papers. I developed a style for writing total parsers in Idris without combinators. It was heavy on boilerplate but I was that committed to totality. I would write that way even in F#. In many ways I would say the purpose of dependent types is to enable totality.

So I was very excited to see that you had ported your stuff to Idris. I read your Arithmetic example and it was very helpful. I am going to try to read the paper now.

Request

If you wouldn't mind, I would love to see the STLC example documented in the style of the Arithmetic example docs. I would really like to grok and internalize what you've done here but my CS degree didn't prepare me at all for understanding academic computer science so I've been working to understand it on my own time because I have a great hunger for it.

Thank you for making this library.

Buggy STLC example?

While working on https://github.com/gallais/agdarky I've noticed that the parser for STLC I had did
not work on examples of the form \f.\a.\b. f a b which would get parsed as
\f.\a.\b. f (a b) rather than \f.\a.\b. (f a) b.
I have pushed a fix on agdarky but haven't had the time to look back at the example
we have here (and in the agdarsec repo).

The idea behind the fix amounts to:

the head of the neutral term is either a variable, a cut or an infer between parentheses
each argument is either a variable or a whole checkable term in parentheses

Is DecEq necessary for tokens?

As far I see, the DecEq constraint on tokens is only used in Combinators.exact, where it is immediately converted to a boolean. Seems like a weaker Eq constraint could be used here, sparing the user of writing things like counterproofs. Is this an artifact of an Agda port, or did you intend to have this constraint?

Error reporting

This currently seems to be the biggest feature missing from TParsec. Do you have any suggestions on how to implement this better? Maybe by changing the Success structure to be a variant type Result = Error | Success?

Compile error in recordChar: No such variable ignore

Hi, I stumbled upon the following problem when compiling (@clayrat):

erik@munin:~/dev/3rdparty/idris-tparsec (master) $ idris --build TParsec.ipkg
Entering directory `./src'
Type checking ./Relation/Indexed.idr
Type checking ./Relation/Subset.idr
Type checking ./Induction/Nat.idr
Type checking ./Data/SizedDict.idr
Type checking ./Data/Inspect.idr
Type checking ./Data/DList.idr
Type checking ./Data/NEList.idr
Type checking ./TParsec/Result.idr
Type checking ./TParsec/Success.idr
Type checking ./TParsec/Types.idr
./TParsec/Types.idr:109:16-56:
    |
109 | recordChar c = MkTPT $ ignore (modify (mapFst (next c)))
    |                ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
When checking right hand side of recordChar with expected type
        TParsecT e a m ()

When checking argument runTPT to constructor TParsec.Types.MkTPT:
        No such variable ignore

Use fresh lists to enforce uniqueness of lexer keywords ?

Fresh list is data structure defined by induction recursion for element types with a decidable equality.

Here's a PR to the Agda standard library with an implementation of fresh lists together with documentation & an updated version of the lexer example using fresh lists: agda/agda-stdlib#878

Adding a .ipkg file?

Please add a .ipkg file for ease of installation an building. It would make using your library more convenient.

Add monadic and combinators

I can't find good names for these in Idris: the m is already taken by the Maybe-tainted
combinators.

Return bounded nats for `decimalDigit`

At the moment decimalDigit in TParsec.Combinators.Numbers returns a Nat
when it should return a Nat together with the proof it is less than 10.
I don't think it's worth using Fin here as we do want natural numbers in the
other parsers using decimalDigit.

Add various `parse` functions

The one function in TParsec.Running is only ever useful to write down
examples but not to actually use it. It'd be nice to have specialised runners
for input String vs. Text and output Maybe a or List a.

Experiment with `Lazy` annotations

I am not sure how much of the time spent parsing examples is actually spent
computing the whole Parser mn p a n for the concrete size n of the input
string. If this is indeed happening then we ought to be able to save a lot of
time by adding Lazy annotations for the second arguments of and, alt, etc.

Lines 154 to 156 in 75b2887

 $ rand (char '\\') 

 $ box $ andoptbind anyTok $ \ c => 

 if c /= into 'u' then fail else unicode

After having read a \ we will only accept a u followed by 4 hexadecimal
digits aka. a unicode character. Shouldn't we also accept:

another \
a "
a r, n, or t
?

	$ rand (char '\\')
	$ box $ andoptbind anyTok $ \ c =>
	if c /= into 'u' then fail else unicode

gallais / idris-tparsec Goto Github PK

idris-tparsec's Introduction

TParsec - Total Parser Combinators in Idris

idris-tparsec's People

Contributors

Stargazers

Watchers

Forkers

idris-tparsec's Issues

Background

Request

Recommend Projects

Recommend Topics

Recommend Org