rust-marker / marker Goto Github PK
View Code? Open in Web Editor NEWAn experimental linting interface for Rust. Let's make custom lints a reality
Home Page: https://rust-marker.github.io/marker/
License: Other
An experimental linting interface for Rust. Let's make custom lints a reality
Home Page: https://rust-marker.github.io/marker/
License: Other
Currently, all nodes hold a local reference to the AstContext<'ast>
object. This uses and requires instructors, with little benefit. It might be smarter to have a singular static reference, which is used to request required information.
This approach is used for symbols in rustc AFAIK. That might be a good place to start the evaluation :)
#81 adds support to specify lint crates in Cargo.toml
files. For now only local paths and git repositories are allowed. In the future, registries like crates.io
should also be allowed.
This issue is currently blocked until the first version of marker_api
has been released to crates.io, as we're otherwise not able to upload a lint crate to crates.io. It might be possible to work around this restriction by defining a git dependency. 🤔 I'll still mark it as blocked for now.
The error messages of Rust are remarkable, creating them can be challenging, though. Rustc uses a Diagnostic
struct and a DiagnosticBuilder
to create these messages. For this project, I'd like to have something similar.
Things to keep in mind:
The Diagnostic
object has to be FFI safe (Maybe it can first use Vec
during construction and then swap to an FFI safe diagnostic)
The information of the Diagnostic
backend should be the common denominator off all backends as it kind of dictates, what each backend has to provide as a lint emission mechanism.
Rust-analyzer supports rustc's diagnostic structure AFAIK, so it's most likely safe to take that as a rough example.
And probably plenty of other things I can't think of rn.
API types use [repr(C)]
to allow transfer over FFI boundaries. This forces the structs to have a layout as declared in code, plus additional padding for alignment. While creating the API, I always just created the structs, without considering the order and size of the types. As a result, they can probably be optimized.
Additionally, I'd like some tests, to validate that the size doesn't unintentionally changes. Rustc does something similar here:
Creating a stable representation for items is currently held up on some fundamental building blocks which are vital to the language. These are mainly generics and types which also depend on generics.
This issue should track the process of implementing generics in rust-linting.
The design should be discussed in rust-marker/design#27. For now, I'll implement a basic representation, based on the grammar.
With nightly-2022-06-30
rustc's internal visibility representation changed to a simple enum that holds three vales. See https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/enum.Visibility.html
This is an interesting and perhaps a nicer representation that the one in linter_api
. For the sync I only temporarily fixed the conversion for rustc since it was already not working and every situation.
This issue has two tasks:
linter_driver_rustc
This issue is meant to keep track of PRs which might affect the next bump.
See https://doc.rust-lang.org/reference/attributes.html
Some lint crates might want to provide the user with configurations to enable/disable certain behavior. Clippy, has a few lints that use configurations specifically for this.
Marker should provide a common and simple way to add configuration values to a lint and loaded them from file. This unifies lint crate behavior and makes it more conformable to the user. Additionally, it allows marker to include the configuration in the lint documentation.
Configurations could be implemented using a marker.toml
file in the workspace root directory, like Clippy does with its clippy.toml
. Each crate could then have its own section, similar to this:
[lint_crate_1]
config = "something"
[lint_crate_2]
animals = ["ducks", "penguins", "cats"]
Some multilingual projects don't use cargo, but interact with the driver directly. The configuration should therefore be loaded by the driver. The best option is probably to move this into marker_adapter
.
This is a collection issue of several smaller cleanup tasks that should be handled. The main goal is to have cargo linter
run without any panics. Currently, it runs into a few todo!()
s in the type conversion.
TODOs
todo!()
s when running cargo linter
cx
argument from ID translation functions where it's not neededto_api_item_id_from_def_id
item(ItemId)
fetch function to AstContext
and use that in the adapter. This might be good to handle in another issue.In rustc and Clippy the entire compiler aborts on a panic. For rust-marker
I would like to change the behavior, to catch panics in specific lint crates and continue execution of the other working lint crates. Additionally, I would like to include information, like the current linted node as part of the output, for simple debugging. The basic system should be simple to implement™
During linting, a common pattern is to check for a specific type, method, or trait. In rustc these are identified using DefId
/LocalDefId
's or type relative checks for methods.
Marker currently doesn't provide a way to check for a specific item, as it only provides the ItemId
and no further information. Ideally, the AstContext
should provide a resolve method or something similar that takes a path like std::vec::Vec
and returns the ItemId
for comparison.
Clippy tries to avoid card coded paths in favor of diagnostic items. This makes sense in Clippy's context, especially when Clippy can just add the IDs to rustc types. See:
For Marker, this is not a viable option, as the rustc_diagnostic_item
attribute is compiler intern. Even if marker would add a similar attribute, users might not always be able to add these to items in question.
Hard-coded paths have one major problem: The user known path might only be a reexport of items from a different module. std::iter::Iterator
for instance, is actually located at std::iter::traits::iterator::Iterator
. If we require the user to specify the absolute path, the lints would break if the items would be moved internally. If we accept the public path, we have to resolve the item, which could potentially be expensive, especially for modules with glob imports.
Ideally, I'd like to choose the second option. Maybe we can try to reuse rustc's path resolution, to find the items in question?
DUMMY_SP
was used to create a RestPat
as rustc doesn't specifically store a Span
for it. This should be updated to use an actual span, over the ..
token.Cargo implements some caching for rustc drivers by default. This can cause confusing, when cargo returns cached results instead of running the driver again. In Clippy, this happened for the Cargo.toml
and clippy.toml
files. Note that changes to .rs
files will cause a rerun by default.
Ideally, marker should rerun, if relevant files have been changed. rust-lang/rust-clippy#9707 might be a good example on how to implement this. The cache should also be avoided if the lint crates have changed, either via console arguments, or in the Cargo.toml
file.
Additionally, it might be worth to also provide lint crates the ability to define relevant external files, that should cause a rerun. However, this could also be too driver-specific.
The driver like Clippy used to have a complicated way to find the system root for the compiler files. rust-lang/rust#103660 added a new uniform way that should also be used by this rustc driver.
With most of the item backend implemented, it's finally time to run it on the first real crates and see what errors, todos and limitations come up. This is a collection of several paper cuts I ran into. Most of these are specific for the cargo interface.
cargo linter
passes linter
as the second argument to cargo-linter
which isn't handled correctlycargo-linter
uses nightly features, which means that cargo install ...
would require a nightly toolchain, this should be avoidable, at least for the cargo subcommand crate.cargo-linter
has a dev feature, which is turned on by default (It should be off by default)cargo-marker
marker_api
version that the driver uses
cargo build --config "dependencies.marker_api=\"0.0.0\""
marker_api
versioncargo-linter
starts the driver with the toolchain from the local rust-toolchain
which can cause errors. Until we have the preimage to be a component in rustup, we need to manually handle the toolchain for the driverto_string_lossy
Rustc has a good API for this that clippy uses: https://doc.rust-lang.org/stable/nightly-rustc/rustc_errors/struct.DiagnosticBuilder.html
Marker should probably have something like this too :)
Clippy and Rustc both use ui test to test lints. These tests run the compiler on test files and compares the output to a .stderr
file that is stored next to the save file.
Dynlint has an extra dylint_testing crate, which is smart and something I believe worth aiming for.
Currently, there are two major ui test crates trybuild and compiletest_rs. From these, I like the interface of trybuild
more, as it's simpler. However, it lacks some customizability, which is required lints. Clippy therefore uses compiletest_rs. See Clippy's compile-test.rs
file
It's also worth noting, that both ui crates use rustc internally. This means that ui tests using these crates require us to use the rustc lint driver for tests. For now, I think this is fine, but we might want to ensure that our testing crate doesn't expose the specific driver :)
#68 Adds a bunch of patterns to the API which require conversion code in rustc's driver. The patterns in question are:
Note that currently some of these might not be testable, as they can only occur in expressions. That's just something we'll have to live with :)
Currently, the CI only checks that the marker_api
crate can be compiled on stable. It's up to the reviewer and programmer to ensure that the API doesn't break. This is fine right now, as the API is under active development and still changes frequently. However, the goal of Marker is to create a stable API.
The rust ecosystem has tools to prevent accidental public API changes, like cargo-semver-checks. Once v0.1.0 or a higher version has been released, I think it would be good to utilize such a tool.
This issue is blocked for now until v0.1.0 or a later point, when we've defined what stability means for marker and ii API breakage will be allowed under specific circumstances.
rust-marker uses kind enums to represent different node types, like items. These currently hold references with the 'ast
lifetime inside them. This might not be ideal for drivers like RA, which may change and remove nodes after a linting pass. It might be better to remove the lifetime. One option for this could be to store the struct directly in the enum and only pass enum references to the API. This still ensures that users can only access immutable references while removing the 'ast
lifetime. However, this will affect the size of *Kind
enums, as they will have the size of the largest variant and not just a reference.
This is an idea worth exploring before v1.0.0
See
ItemKind
enum: marker/marker_api/src/ast/item.rs
Lines 70 to 85 in 2a11448
Currently, the lint crates have to be specified via console arguments. In a real world applications users most likely have a set of crates they want to run every time. Therefore, I suggest specifying lint crates in a file like Cargo does for dependencies.
There are several design decisions as part of this issue:
Cargo.toml
or a new marker.toml
file?Generally see: https://doc.rust-lang.org/cargo/commands/index.html for cargo comments and documentation
Drivers that are not rustc itself, currently need to use the compiler that are compiled with to compile user lints, not the compiler installed on the user system. Solving this probably needs a more detailed C api over the interface traits. Not that it needs to be addressed immediately, just something to consider in long-term.
Create an initial version of the stable pattern representation.
TupleStructPattern
: This will for now be expressed as a struct pattern. We'll have to see if this is really practical.LiteralPattern
RangePattern
PathPattern
(Requires path expressions)MacroInvocation
: Macros are a whole different beast for another time :)GroupedPattern
: This is a syntactical pattern and is expressed in the AST. Therefore, we can ignore it :)See: https://doc.rust-lang.org/reference/patterns.html
This issue only focuses on the API representation. The driver backend should be added in a separate PR. This reduces the PR size and makes sure that I focus on the representation and not the way rustc stores the data :)
Follow up of #49
The rustc driver tested out several ways how an AST can be translated and how callbacks can be implemented. Now that some progress has been made on #8 we have an almost set API for how lint crates and drivers communicate.
The next step is now to restructure or rewrite the rustc driver to be maintainable and follow this new API.
Goals:
I quite like the clippy::pedantic
category and would love to have it enabled for all projects. I left it out at first to allow for fast prototyping. Now I think it's slowly time to reenable it for crates and fix the warnings or allow some lints.
Create an initial version of the stable expression representation.
See: https://doc.rust-lang.org/reference/expressions.html
This will be another big one. On the other hand, this'll probably be the final stretch before v0.0.1
This issue is meant to keep track of PRs which might affect the next bump.
The public interface currently contains some traits like Span<'ast>
, Ty<'ast>
and *Item
. This is cool to implement the functions in the driver. However, this exposure of traits enables users to also implement the traits themselves and pass them back to the context. It also requires the drivers to implement other required traits, like Debug
, Hash
etc.
Instead, it would be nice to wrap all these into structs which will be passed to lint crates. In that way, we can also have a uniform driver independent Debug
and Display
implementation.
Lint crates are compiled by cargo-marker
, the path of the build binary is passed to the adapter via environment values. The loading of these libraries should ideally use OsString
to ensure that it works with whatever interesting path a user might throw at it. This requires a bit of auditing.
cargo-marker
has been checked as part of #66 and should be fine. A second pair of eyes would be appreciated :)
This issue is meant to keep track of PRs which will most likely effect the next sync.
Almost all communication between the linter driver and lint-crates requires the exchange of information. For rust-marker, this is done with marker_api
types. Special care needs to be taken, since Rust doesn't have a stable ABI. To ensure that this is done correctly, I would like to see a CI step that uses the -Z randomize-layout
flag.
It should be checked, that the -Z randomize-layout
is also passed on by cargo-marker
which compiles the driver and lint crates
I imagine something like running this on linux (Since those are the fastest GH runners), and start the test suite two or three times with random layouts. Just running it once might not catch all invalid instances. When we have more tests, the test suite can be reduced to a small subset. It will most likely also be enough if this is run as part of bors tests, just to not waste GH runner time more than needed.
I had to experiment for quite some time until I got the first lints to actually appear. In the end, this is what I did:
linter_lints/tests/ui/find_item.rs
to linter_lints/examples/find_item.rs
cargo run --bin cargo-linter -- -l ./linter_lints -- --package linter_lints --example find_item
RUSTC_WORKSPACE_WRAPPER=target/debug/linter_driver_rustc LINTER_LINT_CRATES=target/linter/lints/liblinter_lints.so cargo check --package linter_lints --example find_item
because I want to understand what is happening under the hood; I find the wrapper crates confusing)Perhaps something like this can be added to the README, to make it easier for new people to check out this work?
The CI currently tests everything on nightly, while some crates should and have to be compilable on stable. A simple step should be, to test that most crates compile on stable.
Do we maybe want to add bors-ng to this repo?
I really like the stability of rust-lang repos and adding bors could ensure that we keep our master green. Since it's a free app, I think it would be a nice addition. The question is now, if everyone is okay with this.
cc @DevinR528?
Create an initial version of the stable item representations.
Open Items:
See: https://doc.rust-lang.org/reference/items.html
This issue only focuses on the API representation. The driver backend will be added in a separate PR. This reduces the PR size and makes sure that I focus on the representation and not the way rustc stores the data :)
In this FIXME
comment here what do you mean by driver-specific lint instances?
When I created the prototype, I hacked together a basic driver ↔ lint-crate communication. After reworking the communication and learning more about FFI stuff, it turns out that the current implementation is unsound. This is a collection of all issues that need to be fixed:
#[repr(C)]
types. These need to be replaced with FFI safe types or at least safely be transferred over the FFI boarder
Option<>
, &str
&dyn Trait
pointers from the API#[repr(C)]
(#6)marker_api::LintPass::registered_lints
returns Box<[..]>
which is not FFI safeNote: We can change #[repr(C)]
to a different ABI as long as it's stable like C
The current API still exposes some enums and internal types, which should ideally only be available to drivers. This issue tracks the progress of removing/hiding them. Enums that only describe a characteristic should ideally be replaced with is_
functions.
Mutability
from api in favor of is_mut()
UseKind
Constness
There are also some outdated types, which should be removed all together:
Sym
in favor of SymId
and returning &'ast str
I've finally decided on a name for this project. The name will be: marker
rust-linting was always a super descriptive, but at the same time boring name. At least for the main crates. I wanted to find a name, which works well for commands, is potentially related to Clippy and just gives the project some character.
I considered several names like checker
, linter
, guppy
etc but never resonated with any of them. Well, I found guppy
pretty cool, since I played a lot of "The binding of Isaac: Rebirth" but wasn't sure about the origin of that name and someone else already claimed that name on crates.io. So back to the drawing board and that's when I came up with the name marker like a text marker. cargo marker
also roles of the tong, at least in my opinion.
So yeah, that's the name I settled on. The marker
crate name has sadly already been claimed, but all the needed names like cargo-marker
, marker_api
etc not. Well, that was until 21 days ago.
The next step, is now to rename all crates and name references to marker. Better to do it sooner rather than later :)
See: https://github.com/rust-marker/marker-crate-reservations
Almost every enum and struct in the API has the #[non_exhaustive]
marker to ensure that it can easily be expanded. This is nice for stability, but not ideal for driver implementations. There could also be users who would prefer a failing compilation, rather than adding wildcards to every match statement.
For this reason, I would like to have a feature (which is enabled by default) that can enable the #[non_exhaustive]
marker.
The implementation will probably be as simple as using:
#[cfg_attr(feature = "add-non-exhaustive", non_exhaustive)]
The feature name is just an example, another name might be better
The feature should also include some documentation, that disabling it (not using default features) might cause compilation failures when the API is updated.
The current lint crate loading implementation sends a trait object over a C ABI, this is AFAIK undefined behavior. Currently, it works quite well, but for the stable version a loading mechanism should be selected that uses defined behavior (and ideally only safe code), to ensure that the interface doesn't need to be changed in the future.
See adapter docs
For using another driver instead of rustc (like rust-analyzer) it should not depend on rustc representation of hir and ast, and ideally it should work on stable toolchain, without #![feature(rustc_private)]
. Is there a plan for doing so? And how we can help in that direction?
The CI scripts have a restriction, so they only run, if .rs
files have been changed, this is good for the normal script, but prevents bors from testing and merging on the try
and staging
branches. Removing the restriction should be enough to make the script work again.
Noticed in #71
The Cargo interface of the marker should ideally provide some capabilities for testing. It would be really nice to have UI Tests directly integrated. Similar to how compiletest-rs
is used in Clippy. (The Clippy implementation lints a file and compares the output to a .stderr
file. The test passes if the output is the same, and otherwise it fails.)
When I run the command in the readme it builds and I get dbg!
output but I also get
error: failed to run custom build command for `linter_api v0.1.0 (/home/devinr/aprog/rust/__forks__/rust-linting/linter_api)`
Caused by:
could not execute process `/home/devinr/aprog/rust/__forks__/rust-linting/target/debug/build/linter_api-aa48e93580444ad5/build-script-build` (never executed)
Caused by:
No such file or directory (os error 2)
warning: build failed, waiting for other jobs to finish...
I only noticed when I tried to only dbg!
print the ItemType::Function
.
Lints often have more context and reasoning than what's included in the emitted diagnostic message. Rustc and Clippy both provide an external list of the lints with a detailed explanation and suggestions how to fix the problem:
Marker should also provide an option to display which lints are installed, along with the documentation provided by them in the declare_lint!
macro. This could be in the form of a cargo marker doc
comment, that fetches the description from the lint crates and then combines them into a website. Ideally searchable, and more than just plain HTML.
StmtKind
as initially implemented in #82 doesn't have a StmtData
trait, like the other *Kind
enums in Marker. The StmtKind
enum is special in comparison to other *Kind
enums as it resolves to different node types. In comparison, ExprKind
will always wrap an expression node. Additionally, the parent of an item and expression nodes might not always be a statement, instead they can have a different expression or item as their parent.
For most common functionality, this should be fine. The typical span()
method can simply be delegated to the span()
method of each statement type. The only thing which can't be delegated that easily, is an id()
method, as this would require the nodes to store the id.
I see a few possible solutions for this:
Option<CommonStmtData>
in each node, which can be a statement.
id()
or similar functions, which require stored data.
StmtId
be an enum, that either wraps the ItemId
, ExprId
or unique number for the let statement
StmtKind
.
See: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_hir/struct.Stmt.html
While refactoring rustc's driver, I noticed that the API currently has two representations for an item/path without generic arguments.
GenericsArgs
struct is wrapped in an Option
making it clear whether they're empty or notGenericsArgs
are constructed with an empty slice inside them.Rustc also uses both of these representations. I'm not sure which one is simpler for normal users. We could just add a method is_empty()
to GenericsArgs
. Not having them wrapped in an extra option could be nice for lints, that want to check for specific generics :)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.