Giter VIP home page Giter VIP logo

Comments (8)

tailhook avatar tailhook commented on June 13, 2024

I'm curious. For my needs combine works just fine with indentation aware syntax, by providing tokenizer which emits indent and unindent tokens. As far as I know python handles indentation at tokenizer level as well as most implementations of yaml.

So the question is how often you actually on raw bytes without any tokenizer? I have always thought it's required for any serious work. (I.e. for anything more complex than parsing 1+2/3). It's especially important because combine is too slow (to compile) and trying to use it on raw chars will probably make compilation times even slower.

from combine.

Marwes avatar Marwes commented on June 13, 2024

I guess this may just be me throwing the idea out there as I am thinking about moving to an indentation based syntax for https://github.com/Marwes/embed_lang. I am aware you can handle indentation in the lexer as I have done that way for Haskell https://github.com/Marwes/haskell-compiler/blob/master/src/lexer.rs#L384-L477. It is not really trivial though so it would be nice if there was a ready way to do it through a library.

I have only done parsers which work directly with char and while its not as efficient as using a separate lexer it does work and I find it makes the parser + lexer simple and easy to modify. If or when I actually need a separate lexer I think it should be easy to move over to that as well. (https://github.com/Marwes/embed_lang/blob/master/parser/src/lib.rs)

Its nice to see someone which has a dedicated lexer though, got any link to that? I am hoping that #37 will make it a bit easier to add a lexer, it would be nice to see and example of working one.

from combine.

hawkw avatar hawkw commented on June 13, 2024

This would definitely be a nice feature to have. I keep meaning to add support for I-expressions to my Scheme parser, and built-in support for indentation-sensitive syntax would make that a lot less work.

from combine.

tailhook avatar tailhook commented on June 13, 2024

Its nice to see someone which has a dedicated lexer though, got any link to that?

https://github.com/tailhook/marafet/tree/master/marafet_parser/src

It's a little bit shitty, because I've tried to quickly port it to new features (in particular Positioner and Range) without getting real understanding of how they are supposed to work.

from combine.

Marwes avatar Marwes commented on June 13, 2024

@tailhook Nice, just open an issue if you have trouble understanding Positioner and Range I should probably add some better docs for those. Anyway, for Range you don't need to invent a dummy type, just use the same type you have for Item. Range is only meant for RangeStream to have a way of storing errors.

from combine.

Marwes avatar Marwes commented on June 13, 2024

Out of scope.

from combine.

rtfeldman avatar rtfeldman commented on June 13, 2024

I came across this issue because I've been writing a parser with combine (and really enjoying it!) and was wondering about the best approach for making it indentation-sensitive.

I totally get that first-class support for this is out of scope, but I'm wondering if there's a recommended approach?

Thanks for a lovely library!

from combine.

Marwes avatar Marwes commented on June 13, 2024

@rtfeldman gluon is indentation sensitive but it is quite a mess, really https://github.com/gluon-lang/gluon/blob/master/parser/src/layout.rs .

The basic idea is that as you scan the input text you emit block open/block close tokens in between the normal, visible tokens. Then the parser is written to match on those tokens.

Other than that just google around I think, I don't have any good resources for it unfortunately.

from combine.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.