dtolnay / syn Goto Github PK
View Code? Open in Web Editor NEWParser for Rust source code
License: Apache License 2.0
Parser for Rust source code
License: Apache License 2.0
I'm looking at possibly using this in Diesel as we switch to Macros 1.1. I was surprised to see that there's a lot of panic!
in the parser. I'd expect this function to return a Result and allow the caller to decide how to handle the error.
The current names are based on the names of libsyntax AST types, but the grammar may be a better model.
https://doc.rust-lang.org/grammar.html
For example let_decl
instead of stmt_local
.
I realise you're still pre-1.0, but it is probably worth considering the forwards compatibility story early. A reason that macros are moving to tokens/strings rather than AST is for the stability of procedural macros. This requires that libraries such as Syn have a forwards compatibility policy.
The idea is that when we add a feature to Rust which changes the syntax, you need to add this to Syn, and to handle such code downstream proc macros must upgrade to the new version. But, the important thing is that if downstream macros do not want to handle the new feature, they can upgrade Syn without breaking.
E.g., say we add unions (well, we already have, but imagine we hadn't), then where previously an Item
enum had Struct
and Enum
variants, now it needs Union
too. That would be a breaking change, since exhaustive matches are no longer exhaustive.
Exactly how you handle forwards compatibility is an open question. You could combine unenforced policy that clients have to abide be with some degree of coding techniques, or you could use solely code (but that might not be possible with some other design decisions).
One example might be that clients should not pattern match structs, Syn avoids struct variants, and every enum has an Unknown
variant which clients should not match.
This is in a pretty sorry state - basically the bare minimum required for Serde.
When building with the cargo build --release
, the build fails with (signal: 11, SIGSEGV: Invalid memory reference)
. Building with cargo build
(no --release
flag) works. Here is the full output, along with the rust version:
ubuntu@host:~/syn$ git describe
0.8.0
ubuntu@host:~/syn$ cargo build --release --verbose
Updating registry `https://github.com/rust-lang/crates.io-index`
Compiling quote v0.2.0
Running `rustc /home/ubuntu/.cargo/registry/src/github.com-1ecc6299db9ec823/quote-0.2.0/src/lib.rs --crate-name quote --crate-type lib -C opt-level=3 -C metadata=9442466506b24325 -C extra-filename=-9442466506b24325 --out-dir /home/ubuntu/syn/target/release/deps --emit=dep-info,link -L dependency=/home/ubuntu/syn/target/release/deps --cap-lints allow`
Compiling syn v0.8.0 (file:///home/ubuntu/syn)
Running `rustc src/lib.rs --crate-name syn --crate-type lib -C opt-level=3 --cfg feature=\"default\" --cfg feature=\"printing\" --cfg feature=\"parsing\" --cfg feature=\"quote\" -C metadata=977935f812d0e598 --out-dir /home/ubuntu/syn/target/release/deps --emit=dep-info,link -L dependency=/home/ubuntu/syn/target/release/deps --extern quote=/home/ubuntu/syn/target/release/deps/libquote-9442466506b24325.rlib`
error: Could not compile `syn`.
Caused by:
Process didn't exit successfully: `rustc src/lib.rs --crate-name syn --crate-type lib -C opt-level=3 --cfg feature="default" --cfg feature="printing" --cfg feature="parsing" --cfg feature="quote" -C metadata=977935f812d0e598 --out-dir /home/ubuntu/syn/target/release/deps --emit=dep-info,link -L dependency=/home/ubuntu/syn/target/release/deps --extern quote=/home/ubuntu/syn/target/release/deps/libquote-9442466506b24325.rlib` (signal: 11, SIGSEGV: Invalid memory reference)
ubuntu@host:~/syn$ rustc --version --verbose
rustc 1.12.0 (3191fbae9 2016-09-23)
binary: rustc
commit-hash: 3191fbae9da539442351f883bdabcad0d72efcb6
commit-date: 2016-09-23
host: x86_64-unknown-linux-gnu
release: 1.12.0
Delegate to the quote!
macro from the quote
crate, then parse the resulting tokens into the appropriate AST type. Example usage:
let attr: Attribute = quote_attribute!{ #[derive(Debug, Clone, #a, #b)] };
Diesel uses the following to parse attributes of the form #[changeset_options(treat_none_as_null = "true")]
:
match options_attr.value {
syn::MetaItem::List(_, ref values) => {
if values.len() != 1 {
usage_err();
}
match values[0] {
syn::MetaItem::NameValue(ref name, ref value)
if name.as_ref() == "treat_none_as_null" => value == "true",
_ => usage_err(),
}
}
_ => usage_err(),
}
I expect this use case to be pretty common so let's provide helpers to make it less bad.
syn
fails to parse valid enum like https://is.gd/bsNmUd. This is blocking issue for num-macros
to work with syn
.
Based on the one from libsyntax.
This workaround for a bug in Rust 1.12.0 is no longer necessary as of rust-lang/rust#37173. The workaround can be removed once we no longer support 1.12.0.
Reduced test case:
syn::parse_expr("match foo { Some(a) => a, , None => 0 }").unwrap()
Output:
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: "failed to parse tokens after expression: \" foo { Some(a) => a, , None => 0 }\""', ../src/libcore/result.rs:799
The error message says that the part of the input that failed to parse is: all of it, except for the initial match
keyword. The message is similar when parsing a much larger match
expression, with the syntax error somewhere in the middle of it.
Need to be careful about compile time, the macro parsing code is not currently compiled in Macros 1.1 mode.
Reported by email from @gregkatz:
I ran into a little bit of a problem doing the syn parsers. It's a little more complicated than I realized. Specifically, I was experimenting with the expression after the while keyword to see what works and what doesn't in the actual Rust compiler, and my parser doesn't actually match the behavior of the compiler. For example here's how the compiler handles the following:
whilevariable {} //error whiletrue //error while{variable} {} //ok while(variable) {} //ok while1<2 {} //error while-1<2 {} //ok while&1<&2 {} //ok while*variable {} //ok while!variable {} //ok
I believe my parser as it currently stands would allow all of this, but I'm not sure how to make it match the behavior of the real compiler.
If you use quote!
on a syn::Generics
, it gives <a, T>
. It should be <'a, T>
.
Rust allows arbitrary code as an array length:
pub struct Screen(pub [Color; {
fn holy_smokes() {
println!("why am I part of this type?");
}
12288
}]);
To minimize compile time, syn supports integer literals only (EDIT: now a few other simple expressions too, but still limited). The difference in compile time is more than a factor of 2 between supporting integers only vs supporting expressions, so currently I believe the tradeoff makes sense.
Given that this is more restrictive than Rust, it would be nice to give a better message than the usual "failed to parse macro input" when the failure is related to an array length.
Doesn't matter for Macros 1.1, but for parse_crate
it would be helpful to be more specific about where parsing failed. Possibly behind a feature gate if it affects compile time.
Some ideas about error management: nom/docs/error_management.md
Parsing structs and enums is enough for Macros 1.1 but parsing all of Rust may be useful for other things.
This is the subset of expressions that are allowed to be used in array lengths and enum discriminants. The authoritative list is in eval_const_expr_partial
.
As in:
pub struct Screen(pub [Color; SCREEN_SIZE]);
pub struct Screen(pub [Color; SCREEN_SIZE as usize]);
Currently we parse labels as lifetimes, then always need to do lt.map(|lt| lt.ident)
. Let's factor this into a separate label
parser.
syn
is currently unable to parse complex explicit discriminant expressions in enum
definitions, where "complex expression" can be as simple as a negative integer.
Set up a nightly Travis build that clones the master branch of rust-lang/rust and runs test_round_trip
over all of it.
Implement the new syntax from rust-lang/rust#34764.
The match
keyword in #48 gets parsed as an ident after backtracking from trying to parse a match expression. Reserved keywords should never parse as ident.
Test case:
syn::parse_expr("macro_rules! noop_expr { ($e: expr) => { $e } }").unwrap();
Output:
thread 'test' panicked at 'called `Result::unwrap()` on an `Err` value: "failed to parse tokens after expression: \"! noop_expr { ($e: expr) => { $e } }\""', ../src/libcore/result.rs:799
pub enum BlockCheckMode {
Default,
Unsafe,
}
pub enum Unsafety {
Unsafe,
Normal,
}
Other than the ordering and choice of Normal
vs Default
, do these really differ from each other?
Inner attributes are supported at the top-level of a file and inside modules, but Rust allows them in many other places.
The following test cases depend on inner attributes:
We don't need a half-baked Aster clone. It only exists because Serde was using the real Aster pretty heavily. Almost all of those use cases would be better served by #5. The rest we can handle with more specific helper functions.
syn
has a parse_macro_input
function and a MacroInput
type. Despite their name, they are very specific to implementing a custom derive
.
Macros 1.1 will presumably support macros like foo!()
, and syn
will be useful for implementing those too. (Maybe with something like Vec<TokenTree>
to represent the input.)
Should parse_macro_input
and MacroInput
be renamed to parse_derive_input
and DeriveInput
? Or maybe parse_type_definiton
and TypeDefinition
?
Write a test to:
syn
>> print using syn
>> parse using syntex
syntex
syntex
AST from 2 and 3 are identicalI think this would give us more confidence in the parser than any unit test suite we could write, and it is also far easier to implement than individually testing every parser element. Once we make more progress on #4 we could run this test against a large repo like the full rustc source code.
cc @gregkatz. I plan to work on this today or tomorrow. In the meantime you don't have to worry about unit testing parsers you write (unless it is useful to you in implementing them). Let's focus on flying through #4 without tests and leave testing to this one.
Parenthesization is represented in the AST which means precedence is not relevant when just reading source code and writing it back, so this is not urgent, but it becomes relevant if somebody wants to process or transform the AST.
let ast = /* ... */;
let ident = /* ... */;
let impl_generics = /* ... */;
let where_clause = /* ... */;
quote! {
#ast
impl #impl_generics Trait for #ident #impl_generics #where_clause {
/* ... */
}
}
This is a great library!
With syntex
, it was possible to support stable Rust via build.rs
and source-code processing. Is there a way to do that with syn
?
The current implementation does a bunch of cloning and returns (Generics, Generics, WhereClause)
. Instead it should return (ImplGenerics<'a>, TyGenerics<'a>, &'a WhereClause)
defined as:
struct ImplGenerics<'a>(&'a Generics);
struct TyGenerics<'a>(&'a Generics);
These are wrappers around the reference that implement ToTokens in the right way.
Here is how libsyntax does it: https://github.com/rust-lang/rust/blob/master/src/libsyntax/fold.rs
cc @SimonSapin
For example:
let s: String = /* ... */;
let lit: Lit = s.into();
There should be conversions for all the Lit variants.
This is a requirement for implementing something like rustfmt against syn.
Here's a piece of code rudely extracted from my playing-around:
syn::ItemKind::Fn(_, _, _, _, _, block) => {
for stmt in block.stmts {
match stmt {
syn::Stmt::Expr(i) => {
// i.node; // `node` is private
println!("e {:?}", i); // but I can see the interesting block under here
}
_ => unimplemented!(),
}
}
// Descend
}
I am trying to parse this code:
pub fn unsafe_block_inside() {
unsafe {}
}
My goal is to be able to answer a "simple" true/false question: does a crate use unsafe
code? If you have pointers for a better way I should be doing that, I'd much appreciate it!
Thanks for making such a useful library! ❤️
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.