I would love to be able to resume parsing once I finish parsing something. For example

Does this work? <div class="highlight highlight-source-cs notranslate position-rel

Resumable Parsing,about benjamin-hodgson/pidgin

Comments (12)

TehPers commented on May 20, 2024 1

Yep, that works. Thank you!

from pidgin.

benjamin-hodgson commented on May 20, 2024

Does this work?

IEnumerable<Token> ParseStream(TextReader stream)
{
    var result = this.Parser.Parse(stream);
    while (result.Success)
    {
        yield return result.Value;
        result = this.Parser.Parse(stream);
    }
}

After the call to parser.Parse the stream is left at the position the parser got up to. (nb, mixing imperative code with lazy enumerables is kind of risky.)

from pidgin.

TehPers commented on May 20, 2024

Sorry for reopening this. I'm still new to this library and I'm trying to use it for a class assignment.

This outputs 6:

private static void Main(string[] args)
{
    var parser = Parser.Digit;
    var input = "121528";

    using (var stream = new MemoryStream(Encoding.UTF8.GetBytes(input)))
    using (var reader = new StreamReader(stream))
    {
        parser.Parse(reader);
        Console.WriteLine(stream.Position);
    }
}

Shouldn't this output 1? I think this is where my issue was coming up.

Edit: for reference, I'm reading input from a file for the assignment.

from pidgin.

benjamin-hodgson commented on May 20, 2024

Ah, I think you're right. Because data is pulled from the stream in chunks, the parser potentially (usually) reads ahead of where it actually ends up. I guess the fix would be to rewind the stream (after checking CanSeek) after parsing, to where the parser actually consumed up to.

from pidgin.

benjamin-hodgson commented on May 20, 2024

Would it be preferable to return the total count of tokens consumed ~~(and possibly even the SourcePos?)~~ as a property of the Result? Then you can rewind the stream yourself.

Edit: On second thoughts you can already pull the SourcePos using CurrentPos so that's probably not necessary.

from pidgin.

TehPers commented on May 20, 2024

I think returning an IEnumerable<T> that can be used to continue reading tokens would be helpful. It could maintain a buffer of tokens that have been read from the input source but not processed and return those first. For example:

IEnumerable<T> GetUnconsumedTokens(TToken[] buffer, int startIndex) {
    foreach (int i = startIndex; i < buffer.Length; i++) {
        // You might want to remove the item at some point for memory purposes too, this is just an example
        yield return buffer[i];
    }

    // read from input here, this depends on the input type of course
}

This could be a method on ITokenStream<T> that is invoked and returned as part of the result maybe? It could also just be a method on ParseState<T> that just reads input by calling ITokenStream<T>.ReadInto to just load tokens one at a time (since they'd just get buffered again later anyway). I'm a little nervous about relying solely on the number of tokens consumed because knowing how many tokens were consumed doesn't mean that those tokens can be recovered. Input streams don't need to implement rewinding, and it's possible that re-enumerating an input IEnumerable<T> just to skip some number of tokens would be an expensive operation.

from pidgin.

benjamin-hodgson commented on May 20, 2024

Ah yeah I quite like that idea. "Here are the tokens which I pulled from the input but didn't actually eat", ie, the remainder of the ParseState's buffer.

from pidgin.

beho commented on May 20, 2024

Would it be preferable to return the total count of tokens consumed ~~(and possibly even the SourcePos?)~~ as a property of the Result? Then you can rewind the stream yourself.

Edit: On second thoughts you can already pull the SourcePos using CurrentPos so that's probably not necessary.

It might be more convenient to have SourcePos as part of the Result. Otherwise you have to append CurrentPos parser to every parser which sort of mixes multiple concerns together. Or is there any other way?

from pidgin.

TehPers commented on May 20, 2024

I've been messing with this a bit and I agree that SourcePos should be part of the Result, although not for rewinding. It would be helpful to be able to pass that back into Parse later to tell the parser where the parsing began. For example, let's say you finish parsing at line 2, column 18. You probably want to tell the parser to continue parsing from there to make sure your SourcePos are accurate and you can get accurate error messages (if you rely on SourcePos for that).

Something that comes to mind is that the SourcePos could be tracked in ParseState and updated as parsing occurs. Bookmarks would probably need to also keep track of the SourcePos though. This is just an idea though, you could just call ComputeSourcePos() at the end ~~but I think that relies on having all of the tokens you parsed stored in the buffer?~~ Nevermind, the parser keeps track of the SourcePos where the buffer starts so it wouldn't really matter that much I guess.

from pidgin.

benjamin-hodgson commented on May 20, 2024

I've been kicking around the idea of replacing SourcePos with a similar but slightly different concept of a SourcePosDelta, representing the change in source location since the beginning of parsing.

When you get to (eg) render an error message, you can concretise the SourcePosDelta by adding it to a specific SourcePos, which would be (1, 1) if you started parsing at the beginning of the file or (1, 1) + previousDelta if you'd already consumed previousDelta-worth of input.

Thoughts?

from pidgin.

benjamin-hodgson commented on May 20, 2024

Code here: 4141044

from pidgin.

benjamin-hodgson commented on May 20, 2024

Regarding leftover tokens, I'm thinking it makes sense to make it part of the protocol between the parser and the TokenStream. I've added an OnParserEnd(ReadOnlySpan<TToken>) method to ITokenStream, to which the parser passes any unconsumed tokens. Code here (on the v3 branch): 851dc1c

from pidgin.

Resumable Parsing about pidgin HOT 12 CLOSED

Comments (12)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent