Comments (12)
Yep, that works. Thank you!
from pidgin.
Does this work?
IEnumerable<Token> ParseStream(TextReader stream)
{
var result = this.Parser.Parse(stream);
while (result.Success)
{
yield return result.Value;
result = this.Parser.Parse(stream);
}
}
After the call to parser.Parse
the stream is left at the position the parser got up to. (nb, mixing imperative code with lazy enumerables is kind of risky.)
from pidgin.
Sorry for reopening this. I'm still new to this library and I'm trying to use it for a class assignment.
This outputs 6:
private static void Main(string[] args)
{
var parser = Parser.Digit;
var input = "121528";
using (var stream = new MemoryStream(Encoding.UTF8.GetBytes(input)))
using (var reader = new StreamReader(stream))
{
parser.Parse(reader);
Console.WriteLine(stream.Position);
}
}
Shouldn't this output 1
? I think this is where my issue was coming up.
Edit: for reference, I'm reading input from a file for the assignment.
from pidgin.
Ah, I think you're right. Because data is pulled from the stream in chunks, the parser potentially (usually) reads ahead of where it actually ends up. I guess the fix would be to rewind the stream (after checking CanSeek
) after parsing, to where the parser actually consumed up to.
from pidgin.
Would it be preferable to return the total count of tokens consumed (and possibly even the as a property of the SourcePos
?)Result
? Then you can rewind the stream yourself.
Edit: On second thoughts you can already pull the SourcePos
using CurrentPos
so that's probably not necessary.
from pidgin.
I think returning an IEnumerable<T>
that can be used to continue reading tokens would be helpful. It could maintain a buffer of tokens that have been read from the input source but not processed and return those first. For example:
IEnumerable<T> GetUnconsumedTokens(TToken[] buffer, int startIndex) {
foreach (int i = startIndex; i < buffer.Length; i++) {
// You might want to remove the item at some point for memory purposes too, this is just an example
yield return buffer[i];
}
// read from input here, this depends on the input type of course
}
This could be a method on ITokenStream<T>
that is invoked and returned as part of the result maybe? It could also just be a method on ParseState<T>
that just reads input by calling ITokenStream<T>.ReadInto
to just load tokens one at a time (since they'd just get buffered again later anyway). I'm a little nervous about relying solely on the number of tokens consumed because knowing how many tokens were consumed doesn't mean that those tokens can be recovered. Input streams don't need to implement rewinding, and it's possible that re-enumerating an input IEnumerable<T>
just to skip some number of tokens would be an expensive operation.
from pidgin.
Ah yeah I quite like that idea. "Here are the tokens which I pulled from the input but didn't actually eat", ie, the remainder of the ParseState
's buffer.
from pidgin.
Would it be preferable to return the total count of tokens consumed
(and possibly even theas a property of theSourcePos
?)Result
? Then you can rewind the stream yourself.Edit: On second thoughts you can already pull the
SourcePos
usingCurrentPos
so that's probably not necessary.
It might be more convenient to have SourcePos
as part of the Result
. Otherwise you have to append CurrentPos
parser to every parser which sort of mixes multiple concerns together. Or is there any other way?
from pidgin.
I've been messing with this a bit and I agree that SourcePos
should be part of the Result
, although not for rewinding. It would be helpful to be able to pass that back into Parse
later to tell the parser where the parsing began. For example, let's say you finish parsing at line 2, column 18. You probably want to tell the parser to continue parsing from there to make sure your SourcePos
are accurate and you can get accurate error messages (if you rely on SourcePos
for that).
Something that comes to mind is that the SourcePos
could be tracked in ParseState
and updated as parsing occurs. Bookmarks would probably need to also keep track of the SourcePos
though. This is just an idea though, you could just call ComputeSourcePos()
at the end but I think that relies on having all of the tokens you parsed stored in the buffer? Nevermind, the parser keeps track of the SourcePos
where the buffer starts so it wouldn't really matter that much I guess.
from pidgin.
I've been kicking around the idea of replacing SourcePos
with a similar but slightly different concept of a SourcePosDelta
, representing the change in source location since the beginning of parsing.
When you get to (eg) render an error message, you can concretise the SourcePosDelta
by adding it to a specific SourcePos
, which would be (1, 1)
if you started parsing at the beginning of the file or (1, 1) + previousDelta
if you'd already consumed previousDelta
-worth of input.
Thoughts?
from pidgin.
Code here: 4141044
from pidgin.
Regarding leftover tokens, I'm thinking it makes sense to make it part of the protocol between the parser and the TokenStream
. I've added an OnParserEnd(ReadOnlySpan<TToken>)
method to ITokenStream
, to which the parser passes any unconsumed tokens. Code here (on the v3 branch): 851dc1c
from pidgin.
Related Issues (20)
- Question: can this URN parser be improved? HOT 2
- Question: ways around lack of left recursion? HOT 12
- Expression handling examples/documentation HOT 1
- Parsing context HOT 4
- Is there a good way to turn Digit.Repeat(n) into a string? HOT 2
- How do I keep both sides of a match? HOT 4
- Question: How to parse a list of strings? HOT 1
- Tried to rewind past the start of the input. Please report this as a bug in Pidgin! HOT 1
- Parsing pseudo freeform text HOT 4
- Matching an exact string HOT 1
- Add support for .net framework in the new versions HOT 1
- Docs website is down HOT 9
- Support trimming and AOT HOT 5
- AOT generic expansion warning HOT 15
- Can you write an example for a Luau parser? HOT 2
- Consider writing tutorials instead of examples HOT 1
- Confusing API: unexpected EOF error HOT 1
- Can't locate documentation: parsing non-char streams. HOT 4
- Maybe Bug: Many/UnsignedInt Operator ignores Whitespace HOT 6
- Need help with `Try` or `Int(10)` HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pidgin.