domenicquirl / cstree Goto Github PK

View Code? Open in Web Editor NEW

85.0 85.0 5.0 285 KB

Concrete Syntax Tree library

License: Apache License 2.0

Rust 100.00%

cstree's People

Contributors

Stargazers

Watchers

Forkers

stupremee binadamu-isiyoonekana segeljakt adotinthevoid

cstree's Issues

Multithreaded tests and expanded testing in general

Implement Send and Sync for SyntaxXY. Add a new test file with tests that share trees across multiple threads by move and by reference. It is important to include tests that hold on to and then drop different syntax nodes as the last variable to access the tree, and to vary the thread this node is dropped in, in order to cover the atomic reference counting of the tree (so we can use these tests to verify the tree gets dropped and de-alloc'd correctly).

I would also like to have a test that checks the source files for uses of SyntaxNode::clone and checks how many times this method is called in each file. The reason for this is that the public Clone impl will increase the ref count and nodes cloned internally will usually want to use clone_uncounted instead. I think it would be good to have a test that prevents contributors from adding clones "by accident".

Investigate making `SyntaxKind` a `u32`

Following a discussion in Discord about "extending" syntax kinds when the parsed language has an extensive plugin system that allows language modifications, I looked at the green tree structs that store the SyntaxKinds.
Both nodes and tokens currently hold u16, u32, u32 (where the u16 is the SyntaxKind), so in theory it may be possible to just bump the space of SyntaxKind without making those smaller.

Breaking due to Language impls.

Make syntax pointers `NonNull`

so nodes and tokens qualify for niche optimizations!

Wishlist

~~An alternative to GreenNodeBuilder::token() that allows passing an interned string as the node's text~~
Allow all of the various things that interact with interners to be generic over the key type, currently they all use Spur
Allow creating GreenNodeBuilders from owned NodeCaches (and then getting the caches back later on) [78e54d5]
Allow creating NodeCaches from owned interners (and then getting the interners back later on) [78e54d5]
Allow getting mutable and immutable references to the interner stored within a NodeCache [cc5ea59, 1f06786]
Allow debugging and displaying both SyntaxNode and ResolvedNode to fmt::Write sources (this would also allow ResolvedNode's Debug and Display implementations to write directly to the formatted instead of creating an intermediate string) [3ef1e7e]

Add `.fold_chunks()` method to `SyntaxText`

It's currently got a .try_fold_chunks() method but no .fold_chunks() one

Improve docs on using existing interners

Finding one's way around with_interner and from_interner (between both GreenNodeBuilder and NodeCache) is very much not trivial. Over the past few weeks I've had several discussions with folks trying to plug in a static ThreadedRodeo, which is even more unintuitive because you need to know that Interner is implemented for &ThreadedRodeo. That impl exists precisely to make situations like ours work, where we are generic over Interner and thus have to take a &mut on the with_interner methods, but it's not something people have their eyes out for when searching for what to do in cstree (since it's in lasso).

An additional example on using existing interners would be good to have (unsure if that fits best in the general docs or on the builder/cache), plus a note for the ThreadedRodeo case in particular. Maybe also a full /examples example to show the integration of a static interner, since that seems to be a common use-case.

Make `parking_lot` dependency optional

We currently use it's RwLock for syntax nodes and their data, but could let users decide about that vs. using std types.

Feature-gate threadsafe syntax trees

Make Send and Sync impls for SyntaxXY a feature and see which parts of the implementation can be simplified for the single threaded case.
From the top of my head, I would start with

the ref count,
NodeData's children, together with SyntaxNode::{read, write},
NodeData's data.

#2 should happen first before we start working on this.

Provide a proc macro for `SyntaxKind`

Currently, going from SyntaxKind to its #[repr(u16)] can be done with a cast (as u16), but the reverse direction requires unsafe:

    fn kind_from_raw(raw: cstree::SyntaxKind) -> Self::Kind {
        assert!(raw.0 <= __LAST as u16);
        unsafe { std::mem::transmute::<u16, SyntaxKind>(raw.0) }
    }

This is confusing to users and manual implementations by users might not ensure that the raw value is actually valid (as through the assert above).

We could provide a macro to generate the implementation of Language automatically, including all the necessary precondition checks.

Update depencies

Prompted by

publishing of the last release telling me that some sub-sub-sub-dependency to crossbeam is yanked
some recent changes in lasso, which we use a lot

Missing Iterator implementations

cstree::Children implements DoubleEndedIterator, ExactSizeIterator, FusedIterator and many of the default methods for Iterator such as .size_hint(), but cstree::syntax::SyntaxElementChildren and cstree::syntax::SyntaxNodeChildren lack support for these, meaning that stuff dealing with the wrapper types aren't as efficient as they should be

Question: Immutable trees and tree modification

Hey there!

I have been recently rewriting my toy compiler for my toy language and I came across your rowan-inspired library, really liked the concepts, even though I am still very new with all the "green" and "red" trees 😃.

Would it be possible or hard to write a couple of small examples regarding such topics like:

How does one partially modify a tree of syntax nodes (or create a partially-reused copy with those modifications)? Let's say, when transforming it from AST to HIR (de-sugaring) or simply changing one identifier in the tree?
How does one reuse the trees during incremental recompilation? Is it related to using a NodeCache?

In-advance, thanks for your time and answers, always hoping to find out something for myself!

Update and fill-in documentation

The changes from rowan to cstree have left method documentation missing or mismatched. I already did the minimal amount of work to update the Readme and crate-level docs, but there are still a lot of holes to fill. In particular, examples/s_expressions is supposed to be a tutorial, but so far I've only made the code work with the changes.

This should also be an opportunity to fill in documentation for things like the SyntaxNode methods where appropriate, since rowan doesn't document a lot of those a priori.

`new` for GreenToken

I'm trying to write a parser right now using the nom parser combinator crate, which seems a bit more bottom up than top down, so I was planning on building GreenNodes manually. The docs claim this is possible, but there is no new method for GreenToken while there is in rowan. That being the case, I'm not quite sure how to use GreenNode::new() if I can't make leaves.

Export `NodeCache`

Currently, NodeCache is public in green::builder, but is not exported, even though there is GreenNodeBuilder::with_cache, which is public.

Support using `cargo nextest`

And migrate CI to run tests through nextest and generate a JUnit report.

Planning issue for `cstree` 0.12

Missing/wrong documentation for `interning::Key`

Clean up dependencies

With the move to cstree, I would like to take the opportunity to go through all dependencies and see if we can clean something up. One thing I'd like to do in particular is to figure out if we can replace smallvec with tinyvec to get away from smallvec's magnetism for CVEs.

Set up CI

It would be nice to have some automatic testing here. Once we have #2, we should also use CI to run the multithreaded tests through a memory sanitizer, so we can verify that the ref counting correctly de-allocs the tree when it is no longer in use.

Update `serde` implementations

The serde1 feature (which we should also maybe rename?) has not yet been updated to cstree. The original implementation is fairly straightforward, probably the main challenge with updating it will be the fact that cstree interns the GreenToken strings. So a SyntaxNode alone will not be sufficient to (de-)serialize, there will need to be a way to get a Resolver involved.

Allow comparing `SyntaxToken`s by text through the interned text keys

in order to allow for more efficient comparisons than resolve_text -> &str plus string comparison.

An interesting thought is extending this to SyntaxNode in the form of preorder_with_tokens, filtering the WalkEvents and comparing token texts with the new method.

API-wise I dislike just returning the Spur key from the tokens, since that breaks abstraction quite a bit. Currently favouring something like a text_eq method for the tokens.

Using `cstree` with `salsa`: interning woes

Currently, cstree hard requires a Spur as our interning key, because our GreenToken cannot be generic (due to the type erasure that is happening). This means that we cannot integrate with salsa's InternKeys, which is an issue for folks using both in their project.

While we cannot be actually generic over the key type, we could (perhaps optionally/under a feature) store the more generic usize inside GreenTokenData. This looses 2 bytes on 64bit targets (going from 6 to 8), but would give users a common interface (together with a compatibility wrapper which has something like unsafe impl<T> lasso::Key for Wrapper<T> where T: salsa::InternKey.

Add a named types as the default interner to avoid type name explosion

The number one offender in this regard is the default interner, which currently is a lasso::Rodeo<lasso::Spur, std::hash::BuildHasherDefault<fxhash::FxHasher>>. This leads to "fun" error messages such as this on courtesy of @RDambrosio016:

Once there is a lasso release which contains Kixiron/lasso#19, we can wrap that in our own DefaultInterner type to cut down on the readability issues, which also affect things like inline hints.

@RDambrosio016 has also suggested to provide a macro to generate such newtypes for all the SyntaxXY types (and maybe optionally an interner). There are some considerations here for how this interacts with methods returning references to the cstree types, which may introduce issues that are not present if a user instead does a type MySyntaxXY = cstree::SyntaxXY style type definition.