domenicquirl / cstree Goto Github PK
View Code? Open in Web Editor NEWConcrete Syntax Tree library
License: Apache License 2.0
Concrete Syntax Tree library
License: Apache License 2.0
Implement Send
and Sync
for SyntaxXY
. Add a new test file with tests that share trees across multiple threads by move and by reference. It is important to include tests that hold on to and then drop different syntax nodes as the last variable to access the tree, and to vary the thread this node is dropped in, in order to cover the atomic reference counting of the tree (so we can use these tests to verify the tree gets dropped and de-alloc'd correctly).
I would also like to have a test that checks the source files for uses of SyntaxNode::clone
and checks how many times this method is called in each file. The reason for this is that the public Clone
impl will increase the ref count and nodes cloned internally will usually want to use clone_uncounted
instead. I think it would be good to have a test that prevents contributors from adding clone
s "by accident".
Following a discussion in Discord about "extending" syntax kinds when the parsed language has an extensive plugin system that allows language modifications, I looked at the green tree structs that store the SyntaxKind
s.
Both nodes and tokens currently hold u16, u32, u32
(where the u16
is the SyntaxKind
), so in theory it may be possible to just bump the space of SyntaxKind
without making those smaller.
Breaking due to Language
impls.
so nodes and tokens qualify for niche optimizations!
GreenNodeBuilder::token()
that allows passing an interned string as the node's textSpur
GreenNodeBuilder
s from owned NodeCache
s (and then getting the caches back later on) [78e54d5]NodeCache
s from owned interners (and then getting the interners back later on) [78e54d5]NodeCache
[cc5ea59, 1f06786]SyntaxNode
and ResolvedNode
to fmt::Write
sources (this would also allow ResolvedNode
's Debug
and Display
implementations to write directly to the formatted instead of creating an intermediate string) [3ef1e7e]It's currently got a .try_fold_chunks()
method but no .fold_chunks()
one
Finding one's way around with_interner
and from_interner
(between both GreenNodeBuilder
and NodeCache
) is very much not trivial. Over the past few weeks I've had several discussions with folks trying to plug in a static ThreadedRodeo
, which is even more unintuitive because you need to know that Interner
is implemented for &ThreadedRodeo
. That impl exists precisely to make situations like ours work, where we are generic over Interner
and thus have to take a &mut
on the with_interner
methods, but it's not something people have their eyes out for when searching for what to do in cstree
(since it's in lasso
).
An additional example on using existing interners would be good to have (unsure if that fits best in the general docs or on the builder/cache), plus a note for the ThreadedRodeo
case in particular. Maybe also a full /examples
example to show the integration of a static
interner, since that seems to be a common use-case.
We currently use it's RwLock
for syntax nodes and their data, but could let users decide about that vs. using std
types.
Make Send
and Sync
impls for SyntaxXY
a feature and see which parts of the implementation can be simplified for the single threaded case.
From the top of my head, I would start with
NodeData
's children
, together with SyntaxNode::{read, write}
,NodeData
's data
.#2 should happen first before we start working on this.
Currently, going from SyntaxKind
to its #[repr(u16)]
can be done with a cast (as u16
), but the reverse direction requires unsafe
:
fn kind_from_raw(raw: cstree::SyntaxKind) -> Self::Kind {
assert!(raw.0 <= __LAST as u16);
unsafe { std::mem::transmute::<u16, SyntaxKind>(raw.0) }
}
This is confusing to users and manual implementations by users might not ensure that the raw
value is actually valid (as through the assert
above).
We could provide a macro to generate the implementation of Language
automatically, including all the necessary precondition checks.
Prompted by
lasso
, which we use a lotcstree::Children
implements DoubleEndedIterator
, ExactSizeIterator
, FusedIterator
and many of the default methods for Iterator
such as .size_hint()
, but cstree::syntax::SyntaxElementChildren
and cstree::syntax::SyntaxNodeChildren
lack support for these, meaning that stuff dealing with the wrapper types aren't as efficient as they should be
Hey there!
I have been recently rewriting my toy compiler for my toy language and I came across your rowan
-inspired library, really liked the concepts, even though I am still very new with all the "green" and "red" trees ๐.
Would it be possible or hard to write a couple of small examples regarding such topics like:
NodeCache
?In-advance, thanks for your time and answers, always hoping to find out something for myself!
The changes from rowan
to cstree
have left method documentation missing or mismatched. I already did the minimal amount of work to update the Readme and crate-level docs, but there are still a lot of holes to fill. In particular, examples/s_expressions
is supposed to be a tutorial, but so far I've only made the code work with the changes.
This should also be an opportunity to fill in documentation for things like the SyntaxNode
methods where appropriate, since rowan
doesn't document a lot of those a priori.
I'm trying to write a parser right now using the nom parser combinator crate, which seems a bit more bottom up than top down, so I was planning on building GreenNode
s manually. The docs claim this is possible, but there is no new
method for GreenToken while there is in rowan. That being the case, I'm not quite sure how to use GreenNode::new()
if I can't make leaves.
Currently, NodeCache
is public in green::builder
, but is not exported, even though there is GreenNodeBuilder::with_cache
, which is public.
And migrate CI to run tests through nextest
and generate a JUnit report.
rowan
#[derive(Language)]
: #39
#[static("+")]
attribute to generate the implementation of static_text
fromWith the move to cstree
, I would like to take the opportunity to go through all dependencies and see if we can clean something up. One thing I'd like to do in particular is to figure out if we can replace smallvec
with tinyvec
to get away from smallvec
's magnetism for CVEs.
It would be nice to have some automatic testing here. Once we have #2, we should also use CI to run the multithreaded tests through a memory sanitizer, so we can verify that the ref counting correctly de-allocs the tree when it is no longer in use.
The serde1
feature (which we should also maybe rename?) has not yet been updated to cstree
. The original implementation is fairly straightforward, probably the main challenge with updating it will be the fact that cstree
interns the GreenToken
strings. So a SyntaxNode
alone will not be sufficient to (de-)serialize, there will need to be a way to get a Resolver
involved.
in order to allow for more efficient comparisons than resolve_text -> &str
plus string comparison.
An interesting thought is extending this to SyntaxNode
in the form of preorder_with_tokens
, filtering the WalkEvent
s and comparing token texts with the new method.
API-wise I dislike just returning the Spur
key from the tokens, since that breaks abstraction quite a bit. Currently favouring something like a text_eq
method for the tokens.
Currently, cstree
hard requires a Spur
as our interning key, because our GreenToken
cannot be generic (due to the type erasure that is happening). This means that we cannot integrate with salsa
's InternKey
s, which is an issue for folks using both in their project.
While we cannot be actually generic over the key type, we could (perhaps optionally/under a feature) store the more generic usize
inside GreenTokenData
. This looses 2 bytes on 64bit targets (going from 6 to 8), but would give users a common interface (together with a compatibility wrapper which has something like unsafe impl<T> lasso::Key for Wrapper<T> where T: salsa::InternKey
.
The number one offender in this regard is the default interner, which currently is a lasso::Rodeo<lasso::Spur, std::hash::BuildHasherDefault<fxhash::FxHasher>>
. This leads to "fun" error messages such as this on courtesy of @RDambrosio016:
Once there is a lasso
release which contains Kixiron/lasso#19, we can wrap that in our own DefaultInterner
type to cut down on the readability issues, which also affect things like inline hints.
@RDambrosio016 has also suggested to provide a macro to generate such newtypes for all the SyntaxXY
types (and maybe optionally an interner). There are some considerations here for how this interacts with methods returning references to the cstree
types, which may introduce issues that are not present if a user instead does a type MySyntaxXY = cstree::SyntaxXY
style type definition.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.