Giter VIP home page Giter VIP logo

Comments (12)

TehPers avatar TehPers commented on May 20, 2024 1

Yep, that works. Thank you!

from pidgin.

benjamin-hodgson avatar benjamin-hodgson commented on May 20, 2024

Does this work?

IEnumerable<Token> ParseStream(TextReader stream)
{
    var result = this.Parser.Parse(stream);
    while (result.Success)
    {
        yield return result.Value;
        result = this.Parser.Parse(stream);
    }
}

After the call to parser.Parse the stream is left at the position the parser got up to. (nb, mixing imperative code with lazy enumerables is kind of risky.)

from pidgin.

TehPers avatar TehPers commented on May 20, 2024

Sorry for reopening this. I'm still new to this library and I'm trying to use it for a class assignment.

This outputs 6:

private static void Main(string[] args)
{
    var parser = Parser.Digit;
    var input = "121528";

    using (var stream = new MemoryStream(Encoding.UTF8.GetBytes(input)))
    using (var reader = new StreamReader(stream))
    {
        parser.Parse(reader);
        Console.WriteLine(stream.Position);
    }
}

Shouldn't this output 1? I think this is where my issue was coming up.

Edit: for reference, I'm reading input from a file for the assignment.

from pidgin.

benjamin-hodgson avatar benjamin-hodgson commented on May 20, 2024

Ah, I think you're right. Because data is pulled from the stream in chunks, the parser potentially (usually) reads ahead of where it actually ends up. I guess the fix would be to rewind the stream (after checking CanSeek) after parsing, to where the parser actually consumed up to.

from pidgin.

benjamin-hodgson avatar benjamin-hodgson commented on May 20, 2024

Would it be preferable to return the total count of tokens consumed (and possibly even the SourcePos?) as a property of the Result? Then you can rewind the stream yourself.

Edit: On second thoughts you can already pull the SourcePos using CurrentPos so that's probably not necessary.

from pidgin.

TehPers avatar TehPers commented on May 20, 2024

I think returning an IEnumerable<T> that can be used to continue reading tokens would be helpful. It could maintain a buffer of tokens that have been read from the input source but not processed and return those first. For example:

IEnumerable<T> GetUnconsumedTokens(TToken[] buffer, int startIndex) {
    foreach (int i = startIndex; i < buffer.Length; i++) {
        // You might want to remove the item at some point for memory purposes too, this is just an example
        yield return buffer[i];
    }

    // read from input here, this depends on the input type of course
}

This could be a method on ITokenStream<T> that is invoked and returned as part of the result maybe? It could also just be a method on ParseState<T> that just reads input by calling ITokenStream<T>.ReadInto to just load tokens one at a time (since they'd just get buffered again later anyway). I'm a little nervous about relying solely on the number of tokens consumed because knowing how many tokens were consumed doesn't mean that those tokens can be recovered. Input streams don't need to implement rewinding, and it's possible that re-enumerating an input IEnumerable<T> just to skip some number of tokens would be an expensive operation.

from pidgin.

benjamin-hodgson avatar benjamin-hodgson commented on May 20, 2024

Ah yeah I quite like that idea. "Here are the tokens which I pulled from the input but didn't actually eat", ie, the remainder of the ParseState's buffer.

from pidgin.

beho avatar beho commented on May 20, 2024

Would it be preferable to return the total count of tokens consumed (and possibly even the SourcePos?) as a property of the Result? Then you can rewind the stream yourself.

Edit: On second thoughts you can already pull the SourcePos using CurrentPos so that's probably not necessary.

It might be more convenient to have SourcePos as part of the Result. Otherwise you have to append CurrentPos parser to every parser which sort of mixes multiple concerns together. Or is there any other way?

from pidgin.

TehPers avatar TehPers commented on May 20, 2024

I've been messing with this a bit and I agree that SourcePos should be part of the Result, although not for rewinding. It would be helpful to be able to pass that back into Parse later to tell the parser where the parsing began. For example, let's say you finish parsing at line 2, column 18. You probably want to tell the parser to continue parsing from there to make sure your SourcePos are accurate and you can get accurate error messages (if you rely on SourcePos for that).

Something that comes to mind is that the SourcePos could be tracked in ParseState and updated as parsing occurs. Bookmarks would probably need to also keep track of the SourcePos though. This is just an idea though, you could just call ComputeSourcePos() at the end but I think that relies on having all of the tokens you parsed stored in the buffer? Nevermind, the parser keeps track of the SourcePos where the buffer starts so it wouldn't really matter that much I guess.

from pidgin.

benjamin-hodgson avatar benjamin-hodgson commented on May 20, 2024

I've been kicking around the idea of replacing SourcePos with a similar but slightly different concept of a SourcePosDelta, representing the change in source location since the beginning of parsing.

When you get to (eg) render an error message, you can concretise the SourcePosDelta by adding it to a specific SourcePos, which would be (1, 1) if you started parsing at the beginning of the file or (1, 1) + previousDelta if you'd already consumed previousDelta-worth of input.

Thoughts?

from pidgin.

benjamin-hodgson avatar benjamin-hodgson commented on May 20, 2024

Code here: 4141044

from pidgin.

benjamin-hodgson avatar benjamin-hodgson commented on May 20, 2024

Regarding leftover tokens, I'm thinking it makes sense to make it part of the protocol between the parser and the TokenStream. I've added an OnParserEnd(ReadOnlySpan<TToken>) method to ITokenStream, to which the parser passes any unconsumed tokens. Code here (on the v3 branch): 851dc1c

from pidgin.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.