paulstansifer / unseemly Goto Github PK

View Code? Open in Web Editor NEW

130.0 5.0 5.0 1.32 MB

Macros have types!

Home Page: http://unseemly.github.io/

License: MIT License

Rust 98.71% HTML 1.05% JavaScript 0.24%

macros types compiler programming-language

unseemly's Introduction

≉

Unseemly typechecks the code that you wrote, not the code your macros wrote. This makes macros feel like part of the language, not something tacked-on.

For a more complete pitch, see http://unseemly.github.io

Unseemly has a bare minimum of forms necessary to bootstrap the implementation of practical languages.

Unseemly is still pretty early-stage, so, while all of the features below exist, there are still a number of things that are janky or incomplete.

Features

From the ML family

Algebraic types (i.e., supports structs and (rich) enums)
Typesafe destructuring with match.
Generic types (or parametric types) (e.g. List<T>)
Recursive types

From the Scheme family

Syntax quasiquotation ('[Expr | … ]' quotes an expression, but inside that, ,[Expr | … ], evaluates its contents and interpolates them)
Pretty-printing respects macro invocations and quoted syntax (the pretty-printer is rather limited at the moment, though)
Hygenic macros (all operations respect α-equivalence)
Macro By Example (easily implement n-ary forms without writing boilerplate loops).

Unusual features

No type errors in generated code (if a macro invocation typechecks, the code it expands to doesn't need typechecking)†.
- Typechecking under syntax quotation (so '[Expr | (plus one ,[e1],)]' is a type error if e1 has the type Expr<String>)
Extensible parsing and lexing (for embedded languages like SQL and regex).

† Except that there are known soundness bugs.

Other features

Full-featured REPL, with persistent command history and line editing (courtesy of rustyline).

How to use it

Install Rust, if you haven't already:

curl https://sh.rustup.rs -sSf | sh

From your Unseemly repository directory, run an example program:

cargo run --release examples/sum_list.unseemly

(Recommended) Get the default prelude for the Unseemly REPL:

cp examples/.unseemly_prelude ~/

Start the REPL:

cargo run --release

Documentation

Look at core_language_basics.md for documentation of the language.

Related work

Research projects

FreshML / Romeo

Unseemly is most closely based on Romeo, which descends from FreshML. (Romeo is closer to Pure FreshML, but the "Pure" part is not present in Unseemly.) Romeo allowed for manipulation of syntax types with complex binding information, but

syntax was otherwise untyped
there was no macro system (so the syntax manipulation was pointless!)
it is just a core calculus

SugarJ / SoundX

SoundX is a language with syntax extensions in which typechecking occurs before expansion. It provides sound language extensions, but

it doesn't support binding annotations (in practice, this means that syntax extension authors wind up writing specifications that contain logic-y things like x ∉ dom(E).)
the language extensions aren't macros (they're not themselves part of the language)
it is just a core calculus

(TODO: are the extensions themselves statically verified to be type-preserving? I think so, but I don't remember for sure.)

Practical languages

Scala

If I understand correctly, Scala's blackbox macros are typechecked before expansion, but they can't do everything that whitebox macros can. Unseemly macros are typechecked before expansion, but are the only macro system needed, because they can (in particular) define new binding forms safely. (TODO: learn more about Scala's macro system)

Wyvern

Wyvern's primary motivating example (write SQL, not strings containing SQL, in your general-purpose code) is a lot like Unseemly's vision of inline syntax extension. Wyvern is a full-fledged language, not a core language. I believe that writing new embedded languages is not as easy as macro definition.

Wyvern also includes a number of features that are outside the scope of Unseemly.

(TODO: learn more about Wyvern)

Terra

Terra, from a quick glance (TODO: learn more), appears to be a language with a close relationship to Lua, similar to the relationship that Unseemly-based languages would have.

In this case, it looks like the goal is to marry a high-level and low-level language together, without an FFI and with inline embedding.

MetaOCaml

MetaOCaml is an extension to OCaml that supports type-safe syntax quotation. Though Unseemly uses "quasiquotation" to describe syntax quotation, it is more similar to MetaOCaml's bracket than to Scheme's quasiquotation, because it respects alpha-equivalence.

Unlike Unseemly, MetaOCaml doesn't use its safe syntax quotation to implement a macro system.

Rust and SweetJS

Rust and SweetJS are non-S-expression-based languages with macro systems that allow rich syntax.

Unseemly is implemented in Rust, and it uses lots of macros.

unseemly's People

Contributors

Stargazers

Watchers

Forkers

atul9 dataronio cloudcalvin stereobooster

unseemly's Issues

Merge `ty_compare::Canonicalize` and `ty::SynthType`

I think that ty_compare::Canonicalize and ty::SynthType are actually doing the same thing to types (though SynthType also operates on expressions); if so, we should remove Canonicalize (making sure not to call it, of course). If it winds up being the case that ast_walk forces us to keep it around to be paired with Subtype, we should still change get_walk_rule to force them to behave the same way.

Also, ty_compare::resolve is not the same thing, because it needs to deal with underdetermined types... but adding underdetermined type handling to ty::SynthType wouldn't cause any harm, and might let us merge that, too. It also needs to stop after one step; maybe we can write ast_walk::walk_one_step that uses some magic to prevent recursive calls?

Support multi-file programs

Ideally, we should have a simple primitive out of which "import this module in another file" can be built; then we can start writing a standard library.

Implement existential types

We already have universal types, adding existential types is probably pretty similar. I believe that something like traits should be macro-implementable with existential types (having some way to make types nominal would probably help shield users from seeing a type that looks like ∃T({add: [T T -> T], subtract: [T T -> T]}, T).

Typing needs to be aware of the length of repetitions

Imagine the following:

'[Expr | (two_argument_function x ...[ ,y, >> ,[y],]...) ]'

It is well-typed only if y is repeated exactly once. Currently, the type system is incapable of even expressing a type with that constaint (it will "try out" typing with one copy, and, seeing that it works, say that it's okay.).

To be maximally generous requires implementing type-level natural numbers, the kind system, and probably a lot of other stuff.

A simpler (and probably sufficient for most purposes) solution would be to say that, for type judgments that use get_rep_res (or whatever) to get a Vec (and be able to index into it, look at its length, etc.), attempting to interpolate repeated syntax is a type error. But a type judgment may instead say "walk these terms together; I don't care about any other aspects of their repetition" (which is similar to how march_parts works already), and, if those terms are (by parsing) produced in sync, typechecking can proceed. I hypothesize that this is how people usually want to use repetition in macros anyways.

At first, we could only allow repetition "alone", but there's no reason we can't expand to allow dotdotdots adjacent to concrete terms as long as all the things being marched together have the same structure.

Parsing: Replace `Biased` with a priority system.

Nested Alts and Biaseds are (a) hard to understand, and (b) impossible to add to, making ::=also limited in what it can do.

Instead:

Make ::=also assume that there's a top-level Alt and add to it, rather than adding another layer of nesting.
Get rid of Biased, and instead have some sort of Priority wrapper that sets a rule's priority.

That way, rules can be usefully prioritized when extending grammars.

This shouldn't be too hard. It will require adjusting LocalParse, but that it's just a matter of changing JustifiedByAtomPlanB and thinking through how prioritized and non-prioritized rules relate to each other. (Maybe they have priority 0, and you can set priorities both positive and negative?) Might also need to think about when a partial parse "loses" its priority.

Create syntax for betas

We need this for import in core_macro_forms.rs

Remove `Ty`

It was originally something else (don't really remember what), but now it's just a thin wrapper around Ast, and it does nothing but get in the way. There's a lot of random converting back and forth between Ty and Ast currently that would get deleted in the process.

Add `Item` nonterminal

Eventually (#4), the user should be able to define arbitrary new nonterminals, but that has some complicated design uncertainties. In the mean time, lots of normal languages have Item, a nonterminal that defines a new name (or names) but (unlike Pat) doesn't expect a context element and therefore don't need to be handled in order. It's common to have all the bindings provided in a list of items be bound in all of the items.

Various MBE operations panic instead of `Err()`ing

This means that a "mismatch" (in practice, often a type error) becomes a panic, which is not good. Offending functions are marked TODO #15.

Implement typed macros

It should be possible to define a macro, (e.g.):

forall T . '{ (lit if) ,{Expr <[Bool]< | cond},
              (lit then) ,{Expr <[T]< | then_e},
              (lit else) ,{Expr <[T]< | else_e}, }'
 conditional -> .{ '[Expr | match ,[Expr | cond], {
                                +[True]+ => ,[Expr | then_e],
                                +[False]+ => ,[Expr | else_e], } ]' }.

...and then invoke the macro:

if (zero? five) then three else eight

...which should expand to the expected code.

Furthermore, if the macro definition is wrong in a way that would expand to ill-typed syntax, it should fail typechecking in an appropriate fashion. And if the macro invocation is wrong in such a way, the invocation should be a type error that does not expose the implementation of the macro or its expansion.

Verify that names are bound at the right type

If one writes a macro that promises to bind a to Int for a particular argument term, soundness requires that this actually takes place.

(This corresponds to the difference between FreshML and Pure FreshML, but the Pure FreshML (and Romeo) solution to this problem is an unacceptable burden to the programmer.)

We must make a conservative approximation; there's no practical way to allow all cases where the programmer could prove that they would do so. In particular, I think we want to take advantage of the fact that macros are usually built by calling other macros, not by calling functions.

I think that the Rust borrow checker is the rough kind of thing this needs (however, this should be a simpler problem).

Language Server

Perhaps another open ended question, but in your mind how feasible do you think general editor completion support could be for unseemly?

I have this pipe dream in my mind that a language server / command driven compiler could give you hints about what forms and types are valid for autocompletion by referencing the macro definitions themselves. Thereby ensuring that you only have one core language server that is able to make completions for any range of DSLs within unseemly.

Is that completely crazy to consider at this point? Is it feasible?

Thanks!

Fix `pre_match`, somehow

There's something wrong with pre_match; I just wish I knew what.

For starters, it's named after when it gets called, not what it does. But mainly, everything associated with it is kinda a mess.

match_dotdotdot should really be recursively calling walk directly, so that it doesn't have to violate our principles and edit user code before checking it. (I'm not totally sure this is possible; it's kinda operating in the middle of an EnvMBE; can we call walk under those circumstances?)

Also, we have an unsettling workaround at the beginning of ast_walk to avoid calling pre_match under some circumstances that I no longer really understand:

    let (a, walk_ctxt) = match *a {
      // HACK: We want to process EE before pre_match before everything else.
      // This probably means we should find a way to get rid of pre_match.
      // But we can't just swap `a` and the ctxt when `a` is LiteralLike and the ctxt isn't.

      ExtendEnv(_,_) => { (a.clone(), walk_ctxt.clone()) }
      _ => Mode::D::pre_walk(a.clone(), walk_ctxt.clone())
    };

I think that the solution may to be to extend get_walk_rule to be able to look at the context, too (and maybe to have, in addition to Custom, a CustomThenRewalk or something). I have not thought this through.

Pass `Assoc` by reference to functions

It’s cheap to clone, but probably not that cheap, and it’s getting cloned all over the place. Also, I suspect that it could simplify some closures to only need a reference.

Factor out a language library

Secretly, there are two parts of the Unseemly implementation. There’s the core*forms.rs files (plus, it seems, ty_compare.rs, which I guess should be renamed?), which define the Unseemly language itself, and then there’s everything else. If we break them apart, we will have a library for implementing languages!

Now, that library is pretty oriented towards the particular kind of language Unseemly is. And the whole point of Unseemly is that you should be able to implement lots of different kinds of languages as Unseemly macros! So, I don’t expect the library to be useful outside of Unseemly. But I think it’s valuable to expose and refine the latent structure of the project.

Also, the library needs a name; what should it be?

Implement kinds

Maybe we don't need higher-kinded types, but they way that the type environment currently works, it's possible to define T as Int, and then use (plus T T) as a well-typed Int expression which will fail to execute.

Make `form_pat!` and `ast!` consistent

For example, one uses vr and the other uses varref, and they have conflicting opinions on whether import needs a comma after the beta.

Convert `ast!` to `u!` in tests.

The u! macro is way terser and way less error-prone (despite being defined in a file called flimsy_syntax.rs). Pretty much every occurrence of ast! in a test should be replaced by u!, unless it's intentionally constructing an Ast you can't get by parsing, or something that u! can't handle, like extend_syntax. (Outside of tests, it seems perhaps a little too do-what-I-mean-ish though.)

For example, the following:

ast!({"Expr" "some_expression" : 
    "frob" => {"Expr" "apply" : "rator" => (vr "x"), rand => [(vr "a"), (vr "b"), (vr "c")]},
    "blorp" => [@"c" (, some_pattern), some_atom],
    "blorp_arms" => [@"c" {"Expr" "panic" :} {"Expr" "relax":}],
    "poip" => (import [* ["blorp"]], (, whatever))})

...can be simplified to:

u!({some_expression : 
    {apply x [a; b; c]}
    [(, some_pattern) {panic :}; (at some_atom) {relax :}]
    (, whatever)})

Imports and (-- ...) and (++ ...) all disappear, barewords are used instead of strings, and atoms need (at ...) wrapped around them (but only if they're being used on their own in a pattern!), and variable references don't need (vr ...). Arguments must appear in the order (and grouped together the same way) they appear in the grammar (hence the collapse of "blorp" and "blorp_arms" above).

Add more core forms

So, making a usable language out of a minimum number of core forms is a cool party trick (and a worthwhile one! We should do it as an example, and build_a_language is a start), but Unseemly needs some “unnecessary” core forms for the same reason it has numeric values built in instead of an awesome Church-numeral-based library. That is to say, we actually care about performance and debugability.

We need

a top-level “bag of definitions”
function definition
side-effects and some kind of loop
let (for a while, I was sad because I thought we needed type-inferring let because of type erasure, but it turns out that match does the trick. But we still care about speed and debugability.)
if-else (in the short term, it’ll also help with the error message for how you have to have an else when you’re using it as an expression; in the long term, macros should be able to express good errors)

Oh, and, come to think of it, there’s also some sort of macro-time error-emitting form that we need and can’t derive.

Also, we should check https://docs.racket-lang.org/reference/syntax.html and make sure we haven’t missed anything.

Build a "reification shortcut" for `FormPat`

Just like how Ast has a special case in Value (AbstractSyntax), and a special-case implementation of Reifiable, FormPat should get something analogous so that the constant reification and reflection in core_macro_forms.rs doesn't have to do as much work.

This should be very similar to what commit ea2a0d9 did for Ast.

Two different types often get named `parts`; rename one of them.

parts is a commonly-used variable name (or variable name suffix) for both EnvMBEs and for LazyWalkResess. This is confusing! We should use it only to refer to EnvMBEs, and change all occurrences of parts referring to LazyWalkReses to something else. Doesn't really matter what. Maybe lwrs. Maybe walker.

(Also, LazyWalkReses is kinda ... a big pile of stuff. Maybe it should be refactored apart somehow, but it seems to work pretty well as a unit, so maybe not.)

MBE reform

First:
Don't store repeats together. Instead of

   leaves: Assoc<Name, T>,
   repeats: Vec<Rc<Vec<EnvMBE<T>>>>,
   leaf_locations: Assoc<Name, Option<usize>>,
   named_repeats: Assoc<Name, Option<usize>>,

have

  leaves: Assoc<Name, T>,
  repeats: Assoc<Name, Vec<EnvMBE<T>>>,
  repeat_groups: Vec<HashSet<Name>> // Each set is a group of names that must have the same length

or even

  data: Assoc<Name, Stars<T>>
  repeat_groups: Vec<HashSet<Name>>

where

enum Stars<T> {
  Leaf(T),
  Rep(Vec<Stars<T>>)
}

struct Stars<T> {
  Vec<T>,
  Vec<usize> // some kinda packed representation of repeat structure I haven't figured out yet
}

(maybe we should drop the concept of named repeats entirely, but this issue is supposed to be semantics-neutral)

Then:
Let's make MBE-related operation more composable.
Instead of parts.map_reduce_with(parts2, mapper, reducer), we should write EnvMBE::zip(parts, parts2).map(mapper).reduce(reducer)

If this makes performance significantly worse we should (sigh) introduce something analogous to an iterator for MBEs.

Also: I think that MBE in this name doesn't really make sense. The variety of things that we're calling "dotdotdot" have more to do with Macro By Example than this does. Maybe StarEnv? Starsoc?

And then (#38) it seems like we'll replace Vec with ExampleVec... hope this actually winds up being a simplification!

Reduce or remove reliance on phaselessness

I think it might be as simple as capturing the type environment at syntax extension time and substituting.

Revise the implementation of splicing

Looking back over the splicing code, it's... deeply confusing. Its tendrils reach everywhere, and I don't really understand it. I think it might be best to try to come up with a different way of doing it from scratch.

The epicenter of the problem is, I think, heal__lwr_splices in ast_walk.rs.

The general idea of splice healing is that, during a LiteralLike walk, we encounter (via some kind of interpolation) a node that represents n of something. So, the grammar says Star(VariableReference), but you have a single thing (probably a name of some kind) representing n variable references. You need to evaluate that single thing (which might be inside, say, some new bindings, or some phase shifts), but then rebuild the EnvMBE representing those multiple things as a "real" repetition.

First step is to come up with a plan. Second step is to check that the plan is different from what we're currently doing.

If we're lucky, resolving issue #38 might open up a new and better way of doing things.

Implement tuples

Tuple types already exist, but they can only be constructed and destructed by parsing and syntax quotation respectively. There should be an actual literal, and an actual pattern for them, respectively. Tuples are stored as a Sequence, and the literal and pattern can be modeled after those for structs.

Freshen less

Currently, ast_walk freshens every Ast element (maybe we should call the different kinds of Ast "nodule"s?) it traverses. This is usually unnecessary! For example, if one is evaluating or typechecking let x = foo; in bar, you don't need to freshen x; you're about to add it to the environment, potentially overriding anything else with that name.

When do we need to freshen? Typechecking [Expr| let x = foo; in bar] is okay, since it's just like typechecking outside a syntax quotation. Even evaluating that expression is okay, even though we're not adding x to the environment, because we're not going to look at the environment. But, if we're evaluating a pattern like match some_syntax { [Expr| let x = foo; in ,[ bar_syntax ],] => ??? }', then we must freshen the context element, because then we're breaking apart a binding structure, "exposing" the "free names" to the "open air".

The other time we need freshening (that I can think of) is when we're doing a type comparison and we hit forall X . ??? and forall Y . ???, and we need to mutually-rename to keep proceeding in the shared environment (since we're looking for alpha-equivalence).

Is there any way to simplify alpha.rs, which is desperately in need of simplification? I can't see anything; it seems like we need real freshening.

Make it easy to generate well-formed Unseemly ASTs.

Currently, the ast! macro makes it super-easy to generate an AST that's missing a crucial import, and it's quite verbose. We need FormPat-driven way to make an AST that the parser would actually output.

Introduce `Environment`

Environment (or maybe just Env) will live in util/ and correspond to the env, less_quoted_env, more_quoted_env, and phaseless_env members of LazyWalkRes. It'll resemble (and perhaps replace, though it is more powerful than it needs to be for this) Ren in alpha.rs. Essentially, it's an Assoc that's aware of quotation levels ("phases").

eval(), synth_type(), etc. will all take an Environment instead of an Assoc. We'll have a macro, env_p0!() which will generate an environment with only phase-0 contents.

We'll need to think about whether Clo should contain an Environment instead of an Assoc, but that's not part of this issue. In fact, this change should highlight a number of places which should get TODOs to rexamine whether they should be using an Environment instead of an Assoc.

Fully-general syntax extension form

For the sake of complete flexibility, we want a way to say "Apply this SynEnv-to-SynEnv function to the current language", not just an "Extend the grammar like this" form.

Make higher-order functions as consistent as possible.

The higher-order functions on mbe and assoc take a mishmash of Fn, &Fn, and &mut Fn. Sometimes these distinctions are relevant, but sometimes they're not.

Allow prelude languages, based on file extension

.≉ files will be the Unseemly core language, but we don't want to have to use some sort of import at the top of every other file just to allow comments and function definition. So, we'll create an (initially hardcoded) table mapping file extensions to languages.

In order to make this fast, we should serialize the language definitions to disk, rather than starting from scratch at startup every time.

Once this is done, I'd like to have .unpy for a Python-like language (implicitly-typed, significant indentation, list comprehensions), .unml for an ML-like language (but without H-M type inference, probably), and probably a third, simpler language for bootstrapping them from. (but that's not in the scope of this issue)

Remove `ddd` from `mbe.rs`

ddd_rep_idxes and everything that touches it wound up not getting used for ...[ ]... and :::[ ]:::, so it should be removed; it was kinda a cool idea, but it was janky as heck.

Generate tooling for sub-languages

Currently, the web IDE assumes that you're writing in Unseemly (and import won't work, besides). Allow the generation of web IDEs that assume a specific language. This should only ("only") require work on the HTML/JS/tooling side. Generate static syntax highlighters by default, rather than the goofy dynamic ones that support extend_syntax.

Also, allow generation of (static, of course) VS Code highlighters for new languages. For that matter, generate one for Unseemly!

Place `dotdotdot` (the quasiquotation construct) once per `Star` node, not once per nonterminal

dotdotdot should be able to go anywhere a repetition can happen. For example, it should be possible to write

'[Expr |  .[  ...[,a, >>  ,[a], : Int ]...   .  <something> ].]'

The only place that core_qq_forms::dotdotdot_form uses its nt argument is in the grammar: (named "body", (call_by_name nt)); that can be replaced by whatever syntax is inside the Star. But there's nothing to call "body"; the Star could potentially contain, say, three Named nodes. What do we do then?

This is a splicing-related can of worms that needs to be opened; the current behavior doesn't make a ton of sense.

Add a compiliation phase, as an alternative to evaluation

The object language will probably be Rust, in order to be able to avoid reimplementing certain chunks of the runtime.

We will need to add a compile walk (expressing some object-language code) to every Form that defines an eval walk. That's pretty straightforward, if a little long.

We would need to modify the ast_walk infrastructure to reify the various things it does to the runtime values. In particular, we'd need a reified version of freshening. (If the object language is Rust, we could just use unseemly::alpha.)

We will need to reify Unseemly values into the object language. If the object language is Rust, can we just use the existing reifier? I think so, except that we're missing the ability to translate Unseemly types into Rust types, which we'll need for type annotations.

Of course, compiling to an actual intermediate language (like LLVM) would be better, but it would be a lot more work. Bootstrapping an actual compiler is possible, even if the Rust implementation remains an interpreter, but I suspect the sheer slowness of this implementation would overwhelm such an effort, in that case.

Implement extensible tokenization

It's possible to switch parsers mid-parse in a principled way; we should do the same thing with tokenization, especially because the default tokenizer is bonkers.

I think the most reasonable way to do this is to have syntax extension also (optionally) supply an alternate tokenizer.

Remove first-person singular language from comments

According to git grep "\bI\b" | wc, "I" is used 76 times in the codebase at the moment. Most of those should be eliminated, because it's often unclear how to edit comments written from someone else's point of view.

But I'm not sure what the best way to fix them is! A lot of them are of the form "I don't know X...", and do you just say "This code was written without understanding X"?

Support MBE in positive syntax quotation and macro grammar parsing

Parsing syntax containing star or plus naturally produces n-ary trees of syntax for each identifier matched. The ergonomic way to deal with that is Macro By Example.

Handle co/contra/in-variance correctly.

Currently Cell<Int> is a subtype of forall T . Cell<T>. That's okay. But also Cell<Sequence<Int>> is a subtype of Cell<forall T . Sequence<T>>. That's bad, because you can't actually put a generic sequence into a Cell that wants a sequence of Ints!

I think that a solution looks like this:

Set the ExtraInfo for type comparison to std::cmd::Ordering (bonus: an easier way to reverse subtyping than swapping the main Ast and the context), and ... use it somehow. Do we need to look it up in every subtyping judgment? (a lot of them are LiteralLike, fortunately)
Somehow communicate which type applications are which. But what about forall T. T<Int>? Maybe we want three different versions of forall and three different versions of <...>? Ugh!

So, this might be (part of the) reason why "normal" MLs don't have an explicit forall type (and, I guess, insist that A be defined if you write A<B>): that way you can infer co/contra/in-variance and not have to make a big deal about it. So maybe we should do the normal thing.

Get rid of `literally` in `ast_walk::walk`

The whole "we need to look at the node that contains us to figure out whether we're being literal" is ugly, especially the corner case for what to do if not inside a node. Instead, I think we can either

Use a boolean that's part of the walk state
(probably better) Have two functions: walk and transcribe (or, I dunno, walk_literally).

Think about this a little more. I think of it as "this node enters literal mode" and "this mode exits literal mode", but what about dotdotdot? Sometimes nodes are just temporarily nonliteral.

Anyways, it turns out that modes really do seem to "default" to literal or nonliteral in a coherent way.

Parse zero repetitions of something correctly

When we assemble a Node from its constituent Ast, we should check to see if any Named names in its grammar are absent from the parse result so far. If so, we should store them as zero copies of a repeated term because that's (presumably) what they are.

(They could also be under a branch of an Alt, but this seems like it's still an improvement in this case.)

See the TODO in earley::c_parse, in the Scope case.

End-to-end test that otherwise fails:

assert_m!(eval_unseemly_program("( .[ a : struct {} . a]. *[extra : five]*)"), Ok(_));

Allow subtyping at the syntax level

In order to use generics, we need subtyping. So an apply macro, instead of demanding that the function be of type [I -> O], should demand that it be a subtype of that. We definitely don't want this to be the case for all syntax arguments, so we need an explicit way of writing this. (also, we presumably will want the reverse at some point)

Would it be possible to just have a AnySubtypeOf<T> type? Surely that can't work...

Hygiene problem in macro invocations

This probably won't hit very often, but any hygiene violation is a soundness problem.

The implementation in a macro_invocation is separated from its environment at the time of Scope's syntax_syntax!, and then reunited at macro expansion time. Instead:

the Expand walk needs to be over Elts and have an environment that gets set by extend_syntax (so I guess that extend_syntax needs to invoke Expand? Or it needs to reify the environment and store it in the Ast?) from negative evaluation of the Syntaxes.
evaluating Syntax needs to produce an environment binding macros to closures that capture their environment. The current function (generating a FormPat) would need to be handled by binding a special name in the environment, it seems.

Check type annotations in quasiquotation, if provided.

Right now, if someone writes '[Expr <[Int]< | some_string]' in a context where the type annotation is unnecessary, that annotation is discarded. We should check it!

Implement enum subtyping (maybe)

Currently, following TAPL, when an enum value is constructed, it needs a type annotation to know what the "rest" of the arms are. (In practice, real languages look up the enum type from the choice name.) Instead, I think we may want to say that, e.g. +[Some 5]+ has the type enum { +[Some Int]+ }, which is a subtype of enum { +[Some Int]+ +[None]+ }.

This could be a minor convenience to the programmer, but the big thing that this gets us is that, when we are constructing a match in a macro, it'll probably be reasonable to do exhaustiveness checking because we'll have types that correspond to "partial" enums.

Improve documentation by including concrete examples next to syntax definitions

Hi Paul,

I'm highly interested in understanding how unseemly works, and have been studying the notes and examples in the repo for an hour with the REPL open trying out different things. Currently, I'm having a lot of trouble figuring out some of the basics like what an expression is and isn't, and how, when, and where I am supposed to use types.

So, my suggestion is that, for new comers, I think we could make the onboarding experience much easier by adding concrete code samples for each piece of syntax so it's easy to see syntax punctuations next to usages.

For example:

## Expressions
* `(expr expr ⋯)` is function application.

Examples

    (plus one one) # => 2
    (zero? one) # => +[False]+

Adding things that can be put into the REPL for trying and remixing would be helpful to understand these other basic constructs:

* `.[ x: Type  ⋯ . expr ].` is lambda.

* `match expr { pat => expr  ⋯ }` is a pattern match.

* `+[Choice expr ⋯]+ : Type` constructs an enumerated value.
    The type annotation is weird, but it helps keep the typechecker simple.

* `*[component: expr ⋯]*` constructs a structure value.

* `forall X ⋯ . expr` abstracts over a type. It is typically used around lambdas.

* `unfold expr` pulls one layer of `mu` off a recursively-typed value.
    It is almost exclusively used for the scrutinee in `match`.

An aside, I don't see ::= documented anywhere, yet it seems to be a building block of the language.

Thanks!

Allow interpolation of identifiers

We currently allow manipulation of Pat, Expr, and Type syntax; we need to be able to manipulate Atom (or maybe we should call it Ident -- I've never really decided whether VariableReference is a wrapper around Atom or a separate thing) in order to write something like:

'[Expr|  .[ ,[my_var], : Int . (plus ,[my_var], one) ]. ]'

In fact, I don't think it's possible to have n-ary function definition without this feature. (Or allowing functions to take patterns instead of parameters as input).

The trouble is that, if I'm writing (plus ,[my_var], one), how does the parser know whether I'm interpolating an Expr or an Atom? This is a syntactic ambiguity.

One option is to structure nonterminals such that Pat → Atom | ⋯ and AtomNotInPat → Atom; then Pat and AtomNotInPat could both be safely extended with ,[ ],. But then, if the author of a lambda macro forgot to use AtomNotInPat, they've laid an ambiguity trap. I think this strategy only works well enough for a temporary solution, but a temporary solution will do for now.

Allow user-defined nonterminals

Currently, the only nonterminals in Unseemly are Expr, Pat, Syntax, and Type. This is hardcoded in a number of places (some of which need to support Syntax still), meaning that a user cannot introduce a new nonterminal to the language.

This isn't an obstacle to creating totally distinct languages, as long as those languages don't need additional nonterminals. And I'm not totally sure whether one ever needs, say, two distinct Expr-like nonterminals. But, in principle, it seems like this ought to exist.

Remove the `quasiquote:` field from `Form`.

Instead, just use form.name == n("unquote") (and "dotdotdot"), since those are the only two forms with unusual treatment.

(We should also get a better notion of Form identity!)

Info: Implementing compilation targets

Hi again,

First, thank you for the prompt follow-up to #34. A lot of the syntax is starting to fall into place for me.

One of my favorite things about your approach is that your language comprises of the minimum constructs needed for building a typed language out of macros. Therefore, I would imagine that compared to other languages, the surface area for implementing alternative compile targets becomes a bit easier since most of the language constructs live on macros.

From your perspective, is it any easier to implement other targets because of the current design?

I was wondering what you thought about exploring Cranelift as a backend. Alternatively, I would be interested in exploring the Erlang BEAM as a target since there aren't many typed languages targeting that ecosystem yet.

Perhaps this is just a conversation starter for now.