rust-lang / rust-analyzer Goto Github PK
View Code? Open in Web Editor NEWA Rust compiler front-end for IDEs
Home Page: https://rust-analyzer.github.io/
License: Apache License 2.0
A Rust compiler front-end for IDEs
Home Page: https://rust-analyzer.github.io/
License: Apache License 2.0
rustc uses the red-green algorithm to implement on-demand and incremental compilation: https://rust-lang-nursery.github.io/rustc-guide/incremental-compilation.html, and it looks like the perfect basis for Code Analyzer support as well.
However, there's a fundamental difference between how rustc does red-green, and how Code Analyzer works.
Command-line compiler operates in a sequence of invocations, so there are distinctly separate phases of updating the root data and calculating queries.
In Code Analyzer, query evaluation and data modification can happen concurrently. Consider the following sequence of events
foo.rs
foo.rs
and present errorsbar.rs
.That is, we have an in-progress query on our hands, and a set of changes to apply. What we want is
foo.rs
Here are some options to deal with the problem
we can wrap QueryEngine in an RwLock
, so that all all request acquire and hold read lock for the duration of the request, modifying the root facts requires acquiring a write lock. When we receive an update request, we cancel all pending requests, wait for them to finish and apply updates. This is the approach used in IntelliJ. It should be very efficient in the total amount of work done, but can cause slow response if cancellation is not threaded throughout all of the requests.
we can wrap QueryEngine in an Arc
and use Arc::make_mut
to apply updates. This requires QueryEngine to be cheaply clone (so, im::HashMap
instead of std::HashMap
), and needs an intricate operation of cloning the database as queries are storing new results into it.
we can implement the first solution by making use of existing interior mutability in QueryEngine. Specifically, we can store an atomic timestamp, which is incremented before we apply changes. When we execute queries, we check the current timestamp and auto-cancel the request, if it is greater than the request-start time.
It's interesting that these solutions are not optimal in a sense that if the current query deals exclusively with foo.rs
file, and modifications apply only to bar.rs
file, we loos the results of the query, although they are not affected by modification.
microsoft/vscode#16221 is the primary tracking/suggestion issue for this.
The exact design of the communication between the client and server for this feature has not been created, blocking a direct proposal in the language server protocol. This repository seems like a good place to prototype and investigate such a feature.
It has been suggested that such a feature could remove any need for explicit parameter naming at call sites in Rust, as linked in my comment in the attached issue.
Sorry if bad formatting! Written on mobile.
String and char literals are tricky, b/c of escape sequences. Lexer should not interpret escape sequences unless absolutely necessary: it is the work of the validator to check them and to convert them to a real literal.
At this moment, parse errors are passed in the publishDecorations
notification, which is ignored when highlighting is off. It probably makes sense to add a new kind of notification, specific for errors, that works independently of highlighting.
Steps to reproduce
> cargo install-code
Compiling tools v0.1.0 (file:///C:/Users/Adolfo/Desktop/libsyntax2/crates/tools)
Finished dev [unoptimized + debuginfo] target(s) in 6.51s
Running `target\debug\tools.exe install-code`
Installing m v0.1.0 (file:///C:/Users/Adolfo/Desktop/libsyntax2/crates/server)
Finished release [optimized + debuginfo] target(s) in 0.44s
Replacing C:\Users\Adolfo\.cargo\bin\m.exe
Error: Io(Os { code: 2, kind: NotFound, message: "The system cannot find the file specified." })
error: process didn't exit successfully: `target\debug\tools.exe install-code` (exit code: 1)
/*
* This doesn't fold
*
*/
It probably should.
Now that #110 has landed we need to do the following:
index_resolve
does not always pick the correct function) and pick the most appropriate one for what the user has currently typedWithout this feature, you have to run cargo check
in your terminal and go back and forth from the terminal to the editor to fix the issues in your code. It would be great to take the output of cargo check
and somehow integrate the errors and warning in the editor itself (i.e. the same way we show syntax errors, for instance)
Looks like we mess up whitespace when generating ast:
It would be cool to avoid that!
https://github.com/rust-analyzer/rust-analyzer/blob/master/crates/ra_syntax/src/ast/generated.rs.tera is the template, and cargo gen-kinds
command regenrates ast from it.
There are a couple of non-trivial things to explain:
cargo gen-kinds
)cargo gen-tests
and inline tests)cargo install-code
, RUST_LOG=trace
)See https://github.com/matklad/rust-analyzer/issues/73 for general description of salsa.
Currenly, salsa is pull based: to freshen the query, we must traverse all of the deps. Even if deps are fresh, traversal is still potentially O(N). This is exemplified by counting newlines test. If there are N files, and we change a contents of one file from foo
to bar
, the total number of lines does not change, but to figure that out, we need to traverse all N files!
Ideally, when changing file, we should mark it as dirty, and than, at query time, check just this one file, see that the number of lines hasn't changed, and avoid freshening all other files, brining total amout of work to O(1) instead of O(N). This is described in this comment and should be implemented.
Note that with such invalidation, if the total number of lines change, we'd still need O(N) scan, but that is probably OK. If the end query is, for example, "the set of items in this file", only some modifications will cause full recalc. Finally, I think using monoid cached trees should make even such costly updates logarithmic?
This is help wanted, because I want someone besides me to look into salsa's code, but keep in mind that this is also E-hard and fun :)
Currently, the VSCode extension consists of a single .ts
file full of functions and a bunch of global variables. As we add features the structure of the code becomes less clear. I think it would be worthwhile to refactor it into something more maintainable (e.g. try to avoid global variables, use multiple files, try to use classes to group related functions).
Additionally, the programming style used is not consistent (e.g. sometimes lines end on ;
and sometimes they don't). IMO the best way to solve this is to use tslint and check that there are no warnings in the CI.
@matklad if you think this is worthwhile I would like to give it a shot
#127 removed support for folding import groups, but in the discussion it turned out we actually want to have that functionallity.
The latest version of the protocol has added file&folder operations: https://microsoft.github.io/language-server-protocol/specification#version_3_13_0, see resourceOperations
in WorkspaceEdit
. We should use it instead of our own FileSystemEdit
. Note that we probably need to preserve bespoke SourceChange
, because it contains cursor_position
: something that LSP 's edits currently are incapable of expressing.
Parser should handle unions
To do this, you'll need this function to parse union contextual keyword:
https://github.com/matklad/libsyntax2/blob/b6f8037a6f8fbcb4f127e1d2b518279650b1f5ea/crates/libsyntax2/src/parser_api.rs#L51-L54
I think it's ok to reuse STRUCT_DEF
SyntaxKind for unions as well, because they have the same syntax as structs, the difference is in semantics.
struct is handled here: https://github.com/matklad/libsyntax2/blob/b6f8037a6f8fbcb4f127e1d2b518279650b1f5ea/crates/libsyntax2/src/grammar/items/nominal.rs#L3
I think the code should be modified to eat either struct_kw or a union as well. The call site should be updated accordingly
For testing, let's use inline tests. The look like a comment:
Running cargo gen-tests
will grep the source for such comments and place the new test cases to test_data/parser/inline
.
Running cargo test
then will create the "gold" parse tree as a txt file, which should be examined by hand and committed with a change.
So, we should add such a test comment with struct
and union
variants, run cargo gen-tests
, cargo test
and that's it?
In principle, it is possible to have nodes of length zero: this would be an internal node without children.
However such nodes are annoying to work with. For example, with empty nodes there maybe arbitrary number of nodes at a given offset (as opposed to at most two for non-empty nodes). So it seems a great idea to forbid this node.
That means that care must be taken to parse stuff like use foo::92;
without empty nodes: we know that there should be a path segment after ::
, but we shouldn't create a node for it unconditionally.
A funny edge case here is an empty file: We'll have to create a file node for it, and it will be empty.
There are several things that parser self-checks for during parsing:
We need to fuzz/property test the parser to verify that this does not happen in practice.
To that end, we need to implement:
We have a curly_block
method over here: https://github.com/matklad/libsyntax2/blob/357cd3358167daa38f3ff34d225e1501faff6015/src/parser/event_parser/parser.rs#L179.
This is a high-order function which eats a balanced paren sequence inside of {}
, which helps quite a lot with error recovery, because all errors are contained in a block. However, this high order function does not help much, and I suggest just removing it and writing while
loops by hand, like in here:
Ideally, the error block should handle {
specially (by parsing a balanced sequence), like it is done in Kotlin:
https://github.com/JetBrains/kotlin/blob/4d951de616b20feca92f3e9cc9679b2de9e65195/compiler/frontend/src/org/jetbrains/kotlin/parsing/KotlinParsing.java#L438
The incremental reparsing model of libsyntax2 is curly-block based.
{
and }
, there's a syntax node for which curlys are the first and the last child.Currently only expr blocks and struct blocks participate in this algorithm, but it should be extended to any kind of braced block. Here's the relevant code:
it should be extended to all braced blocks. The relevant tests are here:
It would be nice to use proptest
or fuzzing to check incremental re parsing! (cc @killercup on the last one). Here's the top-level repasing API:
Add an action which takes a block of line comments and "reflows" text, such that lines are of approximately the same length. This is what M-q
does in emacs.
See https://github.com/matklad/rust-analyzer/issues/86 for a smallish subset of this issue.
for
// foo bar b<cursor is here>az quxx
we select an entire line with extend selection. We should start with just baz
.
The relevant code and tests are here: https://github.com/matklad/rust-analyzer/blob/cd9c5f4ab205e092b87be6affe6d7e78d877dbf0/crates/ra_editor/src/extend_selection.rs
Take a look at the libsyntax parser(link to every method attached) for "what to port" and at the Kotlin parser for "how to port".
if you wanna take care of a method, just check the box and append your username and the PR with -- @username (PR #12345)
Currently, the recovery for
impl<T: Clone>
impl<T> OnceCell<T> {
}
Is bad (see matklad@d76c3dd, we want the second impl to be parsed as an impl).
The problem is that we try to parse a type after and impl
, and impl
keyword can begin an impl trait type, so we interpret the second impl as a start of a type, instead of bailing out. I think we shouldn't try to parse impl trait
type in this position.
I thought that original parser also specifically parsed only a subset of types in this position, but it doesn't seem to be the case?
The relevant code is here:
I think we should pass some flag to type to say "parse only stuff that can appear in impl header"
https://github.com/matklad/rust-analyzer/pull/83 introduced a vscode setting that makes it possible to disable rust analyzer's syntax highlighting. This setting is applied when the extension is initialized, which means that changes only take effect upon restarting vscode. It would be nice to apply changes to this settings in real time, instead of requiring a restart.
I plan to write a PR for this in the future days, but if anyone else wants to fix this just go for it and open a PR!
Are there any easyish language server features (low hanging fruit) that could be implemented?
When working on fall, one feature that turned out to be really useful were inline test. Basically, the idea is that you place an example Rust code alongside the grammar rules, like this:
pub rule where_clause {
'where' <commit>
{type_reference type_bounds {',' | <eof> | <not <not '{'>>}}*
}
test r"
fn f()
where T: Clone + Copy, Foo: Bar
{}
"
this allows one to easily map the structure of the rules to the code which is supposed to be parse by them.
I think we should add the same feature to libsytanx. What's more, we will be using libsyntax itself to implement this feature, and that's why this issue has the fun
label! So, here's the plan.
The tests will be placed in comments in the source code:
// test:
// fn foo() where T: Clone {}
// test:
// fn foo() where T: Clone, {}
fn where_clause(p: &mut Parser) {
...
}
Each test:
test:
prefix and runs until the end of the comment or until the next test:
where_clause_01
, where_clause_02
.To actually run these tests, we reuse the usual tests/data/parser/ok
machinery. Specifically, we write a tool, which parses libsyntax source code, extracts all the tests and writes each one of them as a separate file to tests/data/parser/ok
directory. The tool must deal smartly with adding, removing and changing the tests. Specifically, for added tests it should add a file to the ok
directory with the next test number, for removed test it should probably give an error (simply removing the test file will break tests numbering), and for changed test it should edit the existing file.
Now, the fun fact is that to write such tool, you'll need the ability to parse Rust source code, and that is exactly what libsyntax is for! Currently, it can parse only a small subset of the Rust source code, but it should be enough for the task at hand if we fill a small number of missing things and make a robust error recovery.
I'm trying to implement textDocument/documentHighlight
but am not sure how to find all references to a symbol. I was thinking of either using world_symbols
and filtering on the file id or looping through world.analysis().file_structure(file_id)
and just returning all nodes that match the text in the params.
Implement a tree visitor which can make additional "syntax-like" checks on the tokens and tree produced by the parser.
Hm, looks like the ast is wrong: the structure should be this:
STRUCT_DEF
NAMED_FIELD_LIST
NAMED_FIELD
So, there should be a NamedFieldList
between NamedFieldDef
and StructDef
, and the same for PosFieldDef
. So, you need to tweak the ast section of grammar.ron
.
Perhaps its best to add some kind of
enum StructFlavor {
Tuple(PosFieldList),
Named(NamedFieldList),
Unit,
}
enum? I think it needs to be implemented manually, in the ast/mod.rs file.
Originally posted by @matklad in #110 (comment)
This is a discussion issue to figure out how to make the combination of Rust macros and module system compatible with IDE requirements.
Let's start with formulating a specific narrow task:
maintain a "tree of modules" data structure for a package
Specifically, this data structure should handle goto definition on super
in super::foo
, on foo
in self::foo::bar
and to find the root module for a given file.
To start, let's pretend that macros in Rust do not exist, and solve the problem in this setting. This is how ModuleMap datastructure works.
It maintains a set of links, where each link is a pair of module declaration (mod foo;
) and a file to which this declaration resolves two. To serve "go to parent module", the links are indexed by destination. To serve "go to child module", the links are indexed by source and name, to serve "find crate root", "go to parent" is applied repeatedly (which is ok, b/c crates are shallow).
The main property which makes this data structure "IDE-ready" is that invalidation is cheep: there's roughly a constant amount of work to update a data structure after a single file is changed, independent of the size of the package. When a file is added/deleted, only links which could point to it are updated (based on file_stem), when a file is changed, only links which originate from this file are invalidated.
Another nice property of the macro-less setting is that module tree is isolated to a single package: changes in upstream and downstream crates do not affect module tree at all.
Now, when macros enter the picture, everything becomes significantly more complicated. First of all, macro interfere with name resolution, so use statements become to matter, and upstream crates become to matter. Second, we really need to expand all macros to handle everything correctly. (consider println!("{}", { #[path="sneaky.rs"] mod foo; 92})
).
This last point is worth emphasizing, because it makes the complexity of reacting to modifications pretty bad for IDEs. Consider, for example, this sequence of events:
extern crate failure;
at the crate rootsuper
.To handle the goto request, we need to reexpand all failure macros in the whole crate, and that's O(N)
work for O(1)
modification. What makes this feel wrong is that although any macro expansion could affect module tree, in reality almost none do. That is, for IDE-ready macro expansion the core requirement I think is
Do not expand macros in function bodies unless doing analysis of the function body itself
This model breaks for two reasons, one of which is mod decls with #[path]
attribute, another is impls (a macro-generated impl inside function body is globally visible). I wonder if there's a lanauge-level solution here? Could we require (via a warn lint) to annotate such globaly-visible macro invocations/declarations inside function bodies with #[global_effects]
or something?
If we indeed restrict ourselves to expanding only module-level macros, than I think the original macro-less solution could work reasonably if we model macro expansion as addition of new files?
If you execute join lines lots of times very fast, the resulting program is different than when you execute join lines the same amount of times with pauses in between.
Steps to reproduce:
ra_cli/src/main.rs
main
Ctrl+Shift+J
until the whole main function is folded in a single lineIn the end you should probably see parse errors, because the program was transformed into something inconsistent.
If you press Ctrl+Shift+J
manually in, say, intervals of 1 second, everything goes well.
The MVP for libsyntax2.0 is to create a small library for code editors, which provides some basic functionality, depending only on the syntax. Specifically, the plan is to port stuff that lives in fall here and here.
With the help of resolve_local_name
I think we should be able to rename local variables pretty reliably, so we should expose this functionality via our language server.
@kjeremy you might be interested in this.
salsa is an implementation of red-green on-demand incremental computation algorithm from rustc compiler. See the test for a simple example of it's usage: maintaining the number of newlines per file and total number of newlines.
The problem with current realization is that once you've asked a query, it sits in cache forever, even if you never ask for it's result again. We need to implement a garbage collection scheme, which will clean old results. I think something like "clean query result if it is older than current gen -x" should be good enough to start?
Here's the function that advances the world state by accumulating new changes, it shows how generation counter is bumped, and it is probably the place where GC should happen.
We use Ptr
to access characters and Parser
to access tokens. Unfortunately, they use different method names for lookups, bumping, etc. Lets unify them? The names in Parser
are, in general, better:
at(&self, kind: T) -> bool
-- checks current token. This might use some fancy overloading for lookahead for more than one token/symbol:.at(b"->")
.current(&self) -> T
-- returns current token.nth(&self, n: u32) -> T
-- like current, but for n
tokens ahead.bump(&mut self)
-- bumps current position by one token .Right now Rust Analyzer doesn't give much feedback while writing code. There is no autocompletion and you get only syntax errors. I was thinking about a good first step toward giving more feedback and came to the idea of showing warnings for unused variables. This should be doable by reusing resolve_local_name
, which was defined in #98
In the future we will probably want to extend this to detect unused struct fields, types and other stuff. But we need to begin with something.
Lexer should use the right unicode property for whicespace. We need rust-unic crate for this. This also needs all sorts of funky tests. Vertical tab, anyone? :)
See
The problem is that we need to translate an offset in an edited file to a line/column pair. The problem is that offset is in the file with the edit applied! Using an old LineIndex just gives wrong results.
Ideally, we should translate LineIndex using edit itself, and use that to figure out the index.
A good test would be to use proptest to generate random edits, and compare results with literally applying edits and computing the correct line index.
Rust uses _
and XID_Start, XID_Continue to define identifiers. We need to add lexer test for some edge cases with these classes.
I would expect the following to have two scopes named x
. Is that correct?
fn foo(x: String) {
let x : &str = &x<|>;
}
Extend Join Lines to atomatically remove ///
when joining doc-comments
/// `Kind` is stored in each node and designates
/// it's class. Typically it is an fieldless <|>
/// enum
After Join Lines action:
/// `Kind` is stored in each node and designates
/// it's class. Typically it is an fieldless enum
The code and test for this functionality leave here: https://github.com/matklad/rust-analyzer/blob/cd9c5f4ab205e092b87be6affe6d7e78d877dbf0/crates/ra_editor/src/typing.rs
Lexer does not support block-style comments at all. This should be fixed :)
We have a TokenSet
abstraction over here: https://github.com/matklad/libsyntax2/blob/c8cf1d8cdac48f48caf9505bd5dc20dd2b962317/src/parser/event_parser/parser.rs#L70-L92
It is a set of SyntaxKind
s, which is used during parsing for error recovery: the idea is to define TokenSets for things like FIRST(item)
and then, when an error occurs, skip tokens until a token from this set is encountered.
TokenSet
to a separate file[u64; 4]
to represet a bit-set of tokensconst TokenSet
What are the steps involved in determining function signature information? I want to take a crack at implementing textDocument/signatureHelp
and I think this could also be used in hover
.
I have no idea how the attribute grammar should look like. Is it still key-value pairs, or have we gone full token-trees? The letter would be unfortunate: token trees means zero support from IDEs: no extend selection for cases like #[foo= 1 + 2 + 3]
, because we don't know that 1 + 2 + 3 is an exression.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.