smlfamily / successor-ml Goto Github PK

A version of the 1997 SML definition with corrections and some proposed Successor ML features added.

TeX 100.00%

successor-ml's Introduction

Successor ML

Successor ML is an effort to evolve the Standard ML language while keeping true to its clean and elegant design.

A related effort is the evolution of the Standard ML Basis Library.

Note that this repository used to be called Proposed-Definition-of-Successor-ML, but it has been renamed to Successor-ML. Github will automatically redirect the old URL, but you can also updating any existing local clones using the command:

$ git remote set-url origin https://github.com/SMLFamily/Successor-ML.git

Definition

The LaTeX sources for The Definition of Successor ML can be found in the definition directory.

Design proposals

Coming soon

Implementation efforts

At this time, there are three implementation efforts to support Successor ML.

The most complete is Andreas Rossberg's HaMLet S implementation, which implements all of the features described in the The Definition of Successor ML (and more).
The MLton implementation of Standard ML is working on supporting Successor ML features. See [http://mlton.org/SuccessorML] for details.
The Standard ML of New Jersey implementation of Standard ML is also working on supporting Successor ML features (as of Version 110.79).

successor-ml's People

Contributors

Stargazers

Watchers

Forkers

matthewfluet vishallama dawidvc atsampson ratmice rbarraud minoki piyush-kurur seanpm2001

successor-ml's Issues

fixity

We may as well get the inevitable fight about fixity over with. I think we can agree that the current situation is absurd, both practically and conceptually. Practically, because I don't think the compilers are compatible on this point, particularly as regards modules. And conceptually, because no one have ever succeeded to explain what a fixity declaration and/or specification is supposed to mean, mainly because it's not possible to explain it in those terms.

Fixity is about parsing, obviously. It's not about variables, it's about notation for them. Int.+ is a variable that is notated thus in the standard syntax. It's nice to have infix syntax for this sort of thing once in a while. We can live without it, but people are very attached to the notation for tactics, the original reason for ML to even exist, so it might be hard to part with it for sentimental reasons.

Because we seem very likely to continue to use character representations of programs, we have to do something if we're not to just rule it out entirely. I think Haskell has the best solution, with two notations for marking an identifier as either left- or right-associated infix. I think they use backward quotes and forward quotes; I don't know how this plays with the standard notation for characters though. (I'd like for 'a' to have type char, rather than the preposterous #"a" notation.)

The problem with infix "declarations" or "specifications" is that they are not declarations or specifications. This is well-understood, conceptually, but no reasonable solution has ever been proposed to my knowledge. No one knows how its supposed to behave with open or include, whether it's supposed to be in the signature or the structure or what. I don't think there's a good answer that isn't just arbitrary.

There is a conflict between notation, which has no scope, and modules, which are all about scope. I don't know what to do about this. The Haskell approach just avoids it entirely; the thing itsself is not infix or not, rather you have a way of saying "I am using infix notation now" for anything.

laziness

As an example of a pure extension to the language (one that does not break old code), we should certainly consider adding laziness. Wadler's proposal is probably a good starting point; there have been others.

The simplest is probably to introduce need bindings, which must be permitted to be (mutually) recursive. So we should be able to write something like the declaration need x = e where x has type tau susp if e has type tau. For the self-referential case, it would be need rec x = e, where now e has type tau under the supposition that x has type tau susp; the result is a tau susp. As usual, rec would have to distribute over and so that mutually recursive need cells are possible.

Then forcing suspensions would be integrated into pattern matching. Some sort of notation would be needed (ahem), for example !p in a pattern to indicate that the corresponding value is to be forced and its value matched against p, which could be an as or with pattern. There would need to be a clear sequential semantics for pattern matching so that it is clear what is forced when. As long as expressions can be impure, it is visible when things are forced and we cannot leave that undefined.

fixity

What is the current thinking on the vexed issue of fixity "declarations"? It's all such a mess. Why not adopt the Haskell approach of forward and backward quotes, or something similar, to provide an immediate form of fixity, rather than a context-sensitive declaration?

minor issue with the definition

The Definition as written cannot make up its mind whether it is working with abstract or concrete syntax. The distinction between "atomic patterns" and "patterns" is a case in point. I would say that the statics and dynamics is defined on abstract syntax, and eliminate all such holdovers from concrete syntax.

More convenient opaque ascription

Opaque ascription is a very useful feature, but one of its downsides is that selectively making types abstract in a large structure is very verbose. For example:

functor MakeQux (FB : FOO_BAR) :> QUX
  where type FB.foo = FB.foo
    and type FB.bar = FB.bar =
struct
  structure FB = FB

  type 'a qux = 'a bar foo list

  (* operations go here *)
end

I wish I could do the following instead:

functor MakeQux (FB : FOO_BAR) : QUX =
struct
  structure FB = FB

  (* locally a synonym, externally an abstract type *)
  newtype 'a qux = 'a bar foo list

  (* operations go here, unchanged *)
end

I propose the keywords newtype and neweqtype, which define a type with the following properties:

Inside the current structure, it's a type synonym.
Outside of the current structure, it's an abstract type or eqtype.

Relax string literal rules

The current rules of Standard ML forbid the direct inclusion of any characters outside of ASCII within a string literal. I propose that the rules be relaxed in Successor-ML so that conforming implementations may allow string literals to contain characters outside of ASCII for character encodings that are ASCII compatible, such as single byte encodings like ISO-8859-1 and multi-byte encodings like UTF-8. This would probably also necessitate changes in the Standard Basis' concept of strings, perhaps recording the exact encoding used in a particular string. Thoughts?

The type of top-level `~` should refer `num` rather than `realint`

The SML Basis Library specifies the type of top-level ~ as

val ~ : num -> num  (* default: int -> int *)

i.e. word is accepted.

Appendix E should be updated accordingly.

Parameterized signatures

As previously discussed, parameterized signatures would cut down on typical where type boilerplate. Given that we already have type constructors and functors it would actually be less surprising to users that signatures could also be parameterized.

Signature members would take us most of the way (e.g. functor MkSig (M: sig type t end) :> sig signature S = sig type t = M.t end end = ... and signature SIG = MkSig(type t = int).S but such encodings are still annoying.

End of line for line comments

In section 2.3: "A line comment is any character sequence between the comment delimiter (*) and the following end of line."

I think this is the only place where the spec cares about line endings. Is it worth being clearer about exactly what the "following end of line" is, to avoid ambiguity when processing source files with different line ending conventions? Perhaps something like "... and the next carriage return or linefeed character (or the end of the file)."

Lightweight threads that can use all CPU cores

It would be nice if Successor ML implementations supported lightweight, user-level threads that are multiplexed on top of OS-level threads, as is done by GHC and Go. This allows for simple, highly-performant concurrent code.

Void type

If a type t is void (has no values), any control flow path whose context contains a variable of type t is automatically unreachable. This is useful information for a compiler writer! Sadly, in Standard ML, the only way to construct void types is using recursion:

datatype void = V of void

So a compiler writer has to perform extra work just to establish that a type is void. Why not provide a built-in void type that everyone can tell has no values without doing any work?

signature VOID =
sig
  type void
  val absurd : void -> 'a (* pure, total function *)
end

Foreign Function Interface

Perhaps one of the few essential parts of allowing cross-implementation SML code to be written is a standardized Foreign Function Interface. This is because an FFI is needed if one wishes to use features that are provided by third-party libraries or the operating system, but are not provided by one's implementation.

A good FFI should (note that these are my personal views):

allow calling almost all C functions.
allow for reading and writing C data structures.
allow for exporting functions that can be called from C.
support for cleanup of foreign resources when SML values become unreachable.
not require users to write C stub code by hand. Generating C automatically during compilation is allowed.
be independent of implementation details, such as the representation of SML values.
not cause unnecessary performance degradation.
support SML code being compiled into a library that is called into by foreign code.

MLton has an FFI that I believe supports all of these. I am not familiar with any others.

constructors

Another thing that I am sure has been proposed and maybe implemented is a better treatment of constructors vs variables. I'm all in favor of symbolic-or-capitalized constructors, and lower-case variables. As it stands prefixing almost any program by "datatype x = x" will wreak havoc.

A related point is that it's kind of silly to lexically distinguish type variables, which need no lexical distinction, and not lexically distinguish constructors, which do. I'd love it if we could just use a, b, c etc as type variables, and not always have to write the stupid tick mark.

revisions

I wonder what lessons can be learned, and ideas adopted from, SML#. They have done a lot of work there, and I wonder more generally what should be or could be our relation to that project? I guess it would be hard to combine forces, but we can certainly learn from them, with attribution, at the very least.

functor syntax

Personally I find the syntax for functors disheartening, especially when "sharing" and "where" are used. I never know how to indent, and I never find a way to make it look nice.

I think there is general agreement that the parameter of a functor should be a specification, so that it's a kind of keyword style. SML97 admits that and a positional style, but I think we should just delete the latter entirely. Yes, it rules out F(G(X)), but how often does that come up in practice?

When the realization syntax is so heavy ("where type" instead of "where", and perhaps with "and"), it's all-but-impossible to write things neatly. Lightening the syntax would help, but not solve the problem. Maybe it's not solvable, but it sure would be nice to figure out something cleaner in the common case.

A related point is that I was once an advocate for the :> ascription (well, not that notation, but that concept), but in teaching and in practice I find it to be far more trouble than it's worth. It's easier to confine abstraction to the data types, and use that. (We could even adopt the "newtype" syntax from Haskell for the single-constructor case, but I'm not sure it's a good idea.) Are there any remaining advocates for :>?

Laziness

I like laziness; I don't like lazy languages. It would be nice to have a smoothly integrated way to do lazy computation within Successor ML. It's a 10% thing, but if we can make that nicer, it would be great.

Wadler wrote down a proposal a while back that consolidated many extant proposals at that time. I haven't thought it all through, but off-hand I would suggest a "need" declaration (parallel to "val" and "def" declarations mentioned separately) that would have a memoizing semantics. Need cells can and must be recursive, so rec would have to be permitted. I would guess that a need variable is forced whenever it arises in an evaluation position. Maybe need variables could be lexically distinguished to make it more evident when this is happening --- there is nothing worse than the impossible-to-understand evaluation strategy of a lazy language, but we can do better in our context.

rethinking data types

At the level of more radical changes (which therefore may never happen because of concern for compatibility), I would love to see the data type mechanism replaced by a more general mechanism for inducing implementations of abstract types. I described this in my MLW talk a couple of years ago. Roughly, one introduces the concept of a data signature, a special form of signature for which a default implementation is specified by elaboration convention. If DT is a data signature, one can write "data structure D : DT", or "data structure D :> DT", to induce the default implementation. Another characteristic of a data signature is that it provides hooks for pattern compilation. This allows for non-default-implemented modules to participate in pattern matching --- one can provide one's own implementation of a data signature, and still do pattern matching against it.

Such a thing could be implemented without changing what is already there, I think, but being intended as a replacement, it would not sit nicely in that there would then be two means of achieving very similar ends. That would be the price of evolution, I would guess.

Restrictions on disjunctive patterns

The Definition of Successor ML specifies that the pattern pat1 | pat2 must be irredundant, which is defined that there must exist a value that is matched by pat2 but not pat1. After discussions with Dave MacQueen, I wonder if the restriction should be that pat1 and pat2 are required to be disjoint; i.e., that there is no value that is matched by both patterns.

This restriction has the property that it makes the | pattern constructor commutative, which I think is intuitive.

Evolution vs. revolution

One Meta issue that we ought to discuss is how much breakage we are willing to tolerate. The Successor ML features that were discussed in the previous incarnation of this process (and which are documented in the Successor ML Definition) are mostly backward compatible (i.e., most existing SML programs won't break). I'm not opposed to greenfield (or is it brownfield?) language design, but that places a much bigger burden on implementors (and users).

modular type classes

Unsuprisingly, I am fond of modular type classes --- the implicit application of functors along designated paths of composition to obtain a structure of a specified signature in a "standard" (by declaration) way. It's a purely elaboration-time issue, and would not disrupt old code, I don't think.

Modular type classes could then we used to displace the old equality type mechanism, which I think is pretty broken, especially because it doesn't work with abstract types (and its defaults for data types are pretty much useless).

Smarter integer pattern matching

One of ML's greatest strengths is pattern matching, which allows us to perform sophisticated case analyses without ever branching on a boolean expression. When it comes to case-analyzing inductively defined types, no other general-purpose language is better than ML.

Unfortunately, ML is far less good at case-analyzing integers. Just last week I wrote the following datatype:

datatype 'a tree
  = Pure of 'a
  | Two of int * 'a tree * 'a tree
  | ThreeA of int * 'a tree * 'a tree * 'a tree
  | ThreeB of int * 'a tree * 'a tree * 'a tree
  | Four of int * 'a tree * 'a tree * 'a tree * 'a tree

and the following helper functions:

fun part (m, Two (n, a, b)) = if m mod 2 = 0 andalso n = m + 1 then SOME (a, b) else NONE
  | part _ = NONE
  
fun many (n, ab, cd) =
  case (part (n, ab), part (n, cd)) of
      (NONE, NONE) => Two (n, ab, cd)
    | (NONE, SOME (c, d)) => ThreeA (n, ab, c, d)
    | (SOME (a, b), NONE) => ThreeB (n, a, b, cd)
    | (SOME (a, b), SOME (c, d)) => Four (n, a, b, c, d)

What I wish I could have written instead is

fun many (2*n, Two (2*n+1, a, b), Two (2*n+1, c, d)) = Four (2*n, a, b, c, d)
  | many (2*n, Two (2*n+1, a, b), c) = ThreeB (2*n, a, b, c)
  | many (2*n, a, Two (2*n+1, b, c)) = ThreeA (2*n, a, b, c)
  | many args = Two args

The patterns 2*n and 2*n+1 already take care of matching two consecutive integers, the greater of which is the odd one. More generally, what I would want to see is

Pattern matching affine combinations of integer variables with constant coefficients. For example,
```
fun sameParity (x+y, x-y) = true
  | sameParity _ = false
```

Patterns that only match nonnegative integers. For example,

fun lessOrEq (n, n + nonneg k) = true
  | lessOrEq _ = false

Of course, the type checker should reject patterns that can be matched in two or more ways for the same input. For example, the snippet
```
fun wrong (2*x + 3*y, 4*x + 6*y) = ...
  | wrong _ = ...
```
is wrong because the matrix [2 3; 4 6] (sorry, MATLAB notation) is not invertible, whereas the snippet
```
fun okay (2*x + 3*y, 3*x + 2*y) = ...
  | okay _ = ...
```
is not wrong because the matrix [2 3; 3 2] is invertible.

Does this seem like a reasonable feature to want?

sharing and where clauses

I'm not sure of the current state of play, but the last I knew the syntax and semantics of structure sharing and structure realization was in a state of flux.

It seems to me, working from memory, that sharing and where should be about types. Sharing is a form of spec, and where is a form of sigexp. So, SIG where t = int would be a signature, for example, and type t type u sharing t=u is a form of spec.

Whether to allow "bulk" realization and sharing is questionable. I'm pretty sure the official definition is badly broken, and that compilers differ on what to do with them. Are these really all that important?

Signature members

It should be possible to nest signature declarations inside structures and specifications inside signatures. The OCaml Map module is just one example where this is quite natural compared to the toplevel pollution in the SML/NJ library with ORD_KEY and ORD_MAP.

Abstract signatures make module type checking undecidable, so those should not be added.

clauses

I wonder whether anyone has a suggestion for how to fix the discrepancy in the syntax between fun bindings and case's. As everyone knows, it is monumentally annoying that the former use equal sign, and the latter use double-arrow; switching a fun to a case or vice-versa is painful for no good reason.

Without knowing what has already been suggested or done, let me suggest some options:

Denigrate fun entirely. Except for super-small examples, I don't see a downside to this.
Migrate fun to use double-arrow syntax so that the clauses look exactly like case branches.
The same, but denigrate multiple clauses so that only fun f(x,y,z) => x+y+z would be permitted.

Bringing up this vexed question automatically brings up a few more. The most annoying is the needless shift-reduce conflict between two cases and between a case and a fun. Adding parens is very ugly imo, but ymmv. If we disallow clausal fun's, then one of these conflicts goes away immediately.

To manage the remaining shift-reduce conflict, why not allow curly braces as a form of parentheses for clauses? That way I could write case e of { p1 => e1 | ... | pn => en } if I'd like, but we needn't insist on it. The braces could be used more generally for bracketing any subgroup of clauses in a match, but I would guess the main use is as illustrated.

A related problem is the "fn" syntax, which also raises a problem with the scope of a match. Perhaps we can limit fn to have only one clause, so that fn (x,y,z)=>x+y+z is ok, but not a more general match? It would simply force the use of a case, which I find nicer anyway.

There is also the question of whether fn's should be name-able and self-referential, and whether one could or should admit mutually recursive fn's (which would define a labelled tuple, I suppose).

Incidentally, denigrating fun declarations entirely would have the advantage that we can use fun for lambda's, which I find nicer than the cramped "fn" (and much nicer than "function").

Has anything been done or suggested on these topics already? They seem so simple to fix, and it would certainly make things nicer syntactically.

"opening" datatypes

This is a proposal that could actually be added to Successor ML implementations (SML/NJ and MLton).

I was looking the MLton codebase and noticed code like this popping up everywhere.

datatype z = datatype Control.Target.arch
datatype z = datatype Control.Target.os
datatype z = datatype Control.Format.t
datatype z = datatype Control.codegen

I had no idea what the point of creating an "alias" and immediately shadowing it would be. I had to look at several examples to realize the purpose of this: it's to make the constructors of each datatype available without qualification.

It would be much clearer if, instead of a weird idiom, this could be expressed directly as:

open datatype Control.Target.arch
open datatype Control.Target.os
open datatype Control.Format.t
open datatype Control.codegen

Or even:

open datatype Control.Target.arch Control.Target.os
  Control.Format.t Control.codegen

Since datatype is a keyword, it can't be the name of a module, so there's no ambiguity with open for modules.

open

I think it's been suggested, and perhaps implemented, to allow or require "open X : SIG", rather than just "open X". I think this is a good idea.

Lack of expressivity of functors (type variables).

We present a method for reuse of implementation, where one
datastructure can be defined in terms of another, similar one. We
utilize SMLs module language to transform a given datastructure into a
new one.

Let's look at the set/map duality. If we have a map (aka. finite map,
dictionary, table, etc.) datastructure, then we can easily implement a
set in terms of it:

{ a, b, c } = { a:(), b:(), c:() },

that is a set of elements of type E can be represented as a map from
keys of type E to values of type unit (empty tuples).

Likewise, if we have a set datastructure, we can use it to implement a
map:

{ a:x, b:y, c:z } = { (a,x), (b,y), (c,z) },
a < b iff (a,) < (b,)

that is a map from keys of type K to values of type V can be
represented as a set of elements of type K*V, i.e. key-value
pairs. The ordering supplied to the given set implementation shall
compare the pairs by first constituent only.

Observing this duality prevents the common mistake of duplicate
implementation. The functions operating on sets and maps are
practically the same, they may be long, and there is many of them in a
rich set/map structure. Hence, one shall only implement a set
directly, and then use it to implement a map. Alternatively, one can
start by implementing a map, and implement a set in terms of it. In
StandardML one should be able to use a functor to easily implement one
datastructure in terms of another.

The preferred way would be to implement the set directly, and use a
functor to transform it to a map. Why set? Because it's slightly
easier and more elegant to implement - the internal datastructure only
needs to store one value - the element, instead of two - keys and
values.

However (due to a problem described later), we implement a map
directly and later transform it into a set.

The signatures used here are pretty much the same as ORD_KEY, ORD_SET
and ORD_MAP in SML/NJ.
*)

(* Signature for our ordered type: *)

signature Comparable = sig
type T
val compare: T*T -> order
end

signature MAP = sig
type Key
type 'v Map
val empty: 'v Map
val put: Key -> 'v -> 'v Map -> 'v Map
end

(* The map datastructure is implemented directly: *)
functor Map(comparable: Comparable) :> MAP where type Key = comparable.T =
struct
type Key = comparable.T
datatype 'v Map = Empty | Binary of 'v Map * Key * 'v * 'v Map
val empty = Empty

(* This implementation lacks any form of balancing, so a better map
would be even more complicated. *)

fun put k' v' Empty = Binary(Empty, k', v', Empty)
| put k' v' (Binary(l,k,v,r)) =
case comparable.compare(k',k) of
LESS => Binary(put k' v' l, k, v, r)
| EQUAL => Binary(l, k', v', r)
| GREATER => Binary(l, k, v, put k' v' r)
end

signature SET = sig
type Element
type Set
val empty: Set
val put: Element -> Set -> Set
end

(* The set datastructure uses map internally, so the many long )
( functions (like put) don't need to be repeated: *)
functor Set(comparable: Comparable) :> SET where type Element = comparable.T =
struct
type Element = comparable.T
structure M :> MAP where type Key = Element = Map(comparable)
type Set = unit M.Map

val empty = M.empty
fun put x s = M.put x () s
end

(* Some usage examples: *)
structure ComparableInt = struct
type T = int
val compare = Int.compare
end

structure IMap :> MAP where type Key = int = Map(ComparableInt)
structure ISet :> SET where type Element = int = Set(ComparableInt)

val set : ISet.Set = ISet.put 8 ISet.empty

(* Good, maps are polymorphic over values: *)
val i2s : string IMap.Map = IMap.put 8 "a" IMap.empty
val i2c : char IMap.Map = IMap.put 8 #"a" IMap.empty

To summarize, we implemeted the map directly, and then used it to
implement the set.

Now assume we want to go the other way around - first implement the
set directly. We encounter a problem when trying to use it to define a
map. It was easy taking a polymorphic type 'v Map, instantiate 'v to
unit and get monomorhic type of sets. But the reverse seems
impossible. How can one take a monomorhic set type and use it as a
polymorphic 'v Map? Note that the map functor shall not bind the value
type. The value type shall stay polymorphic, and only the key type
shall be bound in the resulting structure. So the map functor can only
take the key type as argument. In other words, we don't want
IntToStringMap.Map, but we want (string IntMap.map).

It's quite natural that one should be able to go both ways. Therefore,
I consider this as a shortcoming of StandardML - an example of a
slight lack of expressivity of the module language.

How could this be fixed in SuccessorML?

Unicode in SML text

As an extension of #29 , allow all characters with Unicode general category letter or number to be used for alphanumeric identifiers (*), all characters with Unicode general category symbol to be used for symbol identifier and all characters with Unicode general category number to be used for numbers (**). Take care of additional end of line characters in Unicode.

(*) With, as it is already, the addition of the underscore and numbers not allowed as the first character.

(**) Optionally and not Unicode related, would be nice to allow underscores in numbers like Ada do, as it helps readability: ex. 1_234_567

Segregation vs Integration

Presumably few of us are segregationists, but let me raise the question anyway. A number of years ago Philippe Ajoux did an undergraduate thesis with Guy and me on formulating a dialect of ML in which storage and I/O effects are segregated from expressions. (I don't believe there is any such thing as "an effect", nor do I think that exceptions or callcc are such. But allocation of exceptions is another issue, because they are storage effects.) If done properly (not like Haskell) it can be quite a nice thing, but I admit readily that it inhibits benign effects. (Having said that, performIO is NOT in our setting unsafe, and can be more easily embraced without difficulty or embarrassment.) The main thing that absolutely must change in a segregated variant is the top-level. One must distinguish between evaluating expressions and performing commands. (The Haskell kludge of automatically running an expression of type IO a is absurd in my view.)

I wonder whether there is any sympathy for segregation, or whether there is any reasonable way to admit both modes? Come to think of it, this raises a related point that I got from Felleisen, namely "language levels". Maybe we could have a segregated "language level", leaving the integrated form for "experts"? It would be good to have a "pure" language level in which the restrictions on polymorphism are not needed. Students invariably type, say, "nil @ nil", and get a preposterous error message that is all but impossible to explain.

Regular expressions

I've noticed there is dedicated issue tracker for the basis library, so I moved the issue there:

SMLFamily/BasisLibrary#14

Integer/Word constants within string constants

It would be nice if in addition to the decimal constant \ddd and \uxxxx
you could just embed an integer constant directly into a string constant prepended by \e.g. #"\0x33"
or something similar to the c syntax '\x33'

The Char.chr function works for this, but it cannot be used in pattern matches

[edit: I suppose just embedding the integer/word constants isn't terribly elegant due to \1x33 etc]

String interpolation

Not sure if this has been discussed before elsewhere, but I didn't see another issue on this repo about it.

It might be nice to have string interpolation available, where string literals are "interrupted" to include SML expressions that are inserted at that location. Here's an example of Swift's syntax for string interpolation, which I personally like:

"SELECT \(columnNames.joined(", ")) FROM \(tableName)"

How I'm imagining this feature is that it would only affect the grammar (and would be fully handled by the parser for implementations); an interpolated string literal would be syntactic sugar for either repeated uses of ^ or a use of String.concat.

Thoughts? Not the most important feature, but I just thought it might be kinda nice to have.

rec syntax

I like the improvement to the rec syntax, which was wildly out of control in Standard ML. However, let me lay down a marker saying that it may have to be revised again depending on what we do about my suggestions regarding "val", "def", "need", and "fun" declarations. Where "rec" is permitted should be tied to the semantics of the thing. Functions can be recursive, need cells can be recursive, but can general values?

Higher-order functors

Higher-order functors are an old and obvious extension.

I think the only problem is whether defunctorizers like MLton can support them. I suspect that the module language would still be normalizing (since it can be reduced to a subset of System F(omega)) so the defunctorizer would still terminate instead of infinite expansion. Extending the actual algorithms would surely be nontrivial.

line comments delimiter

Following the recent ideas in (ocaml pull-request discussion)[https://github.com/ocaml/ocaml/pull/671], I would like to propose another symbol for line comments - |*, since (*) conflicts with typical documentation generation tool like ocamldoc/haddock/doxygen.

Pro:

|* is easy to extend for smldoc - |**.
It is asymmetric, two character.
Stylistically it is close to (* comment *); the opening symbol | does not have a pair like [ or (; blocks of comments are naturally separated by vertical line from code:

fun f x = 8 +  |* intercept term
    (sin x) +  |* sin component
    (cos x) +  |* cos component
    (sqrt x)   |* this works only for x >= 0

It gives the same benefits as (* *) when commenting parts of the code. You need to add just one symbol to the right of pattern-matching or sum type declaration:

(* here we temporary comment out cos x with just two '*' *)
fun f x = 1.0 (*cos x*) 

datatype option a' = None
      | Some a'
      |* Unsure (* may be we will need this in the future? *)

Contra:

(*) is already in proposal.
|* is relatively hard to type compared to --, // or other common symbols.

The name of the language

Today if a product is not easily searchable, it almost certainly
cannot become popular.

Searching for anything SML-related is a pain. To find about sth I
usually start with "SML" "sth". If nothing pops up, I continue with
"Standard ML" "sth". And so on, with "StandardML", "Standard-ML",
"ML97", "ML`97", "ML'97", and if I'm desperate I will also try
"Standard Meta Language", or just "ML". Even if my search "SML" "sth"
returns some relevant results, I'll often continue with other names,
as I'm often interested in exhaustive search.

The same goes for library search. When I was looking for SML books, I
had to look up the catalogue under ML, SML, StandardML.

Unfortunately, the name SuccessorML is as bad as StandardML, or maybe
even worse. I've seen variants such as: "SuccessorML", "Successor ML",
"Successor-ML", "sML".

Also it is a problem that "SML" (for Standard) and "sML" (for
Successor) differ only by case. This makes it hard to express
succinctly without ambiguity which language I'm talking about. This
can lead to people forming even more names, i.e. "SuccML" found here:

#16

Also "SML" and "sML" are indistinguishable by search engines. This
makes it impossible to search for information about only one of
them. If they were called X and Y, then I could search for only "X",
only "Y", or "X or Y". But as it is now, I'm always searching for
both, so the option to search for only one of them is not available.

The problem is compounded by the fact that there are hundreds of other
things (companies, products, etc.) that abbreviate to SML.

Here are some desirable properties for language name that I can think of:

There must be exactly one name. No variations, abbreviations,
synonyms, different spellings. This prevents the search-hell.

Must be as unique as possible. No other things called the same.

Must be short, so people don't feel the pressing need to shorten
it. Shortening leads to multiple names, which leads to search-hell.

Not like a common word (e.g. go), too many false hits.

Should be pronouncable. E.g. "5~f" has all previous properties, but is
hard to pronounce.

Should refer to previous versions of the language, if they are similar.

To give an example of a name close to ideal (having all of the above
properties): C++.

IMHO one good possibility would be "ML16", with anything between "ML"
and "16" being explicitly forbidden by the standard. That would also
clarify which snapshot of the language we're talking about. It looks
like Successor is supposed to be an evolving, roling-release language,
but this makes it very hard for compiler writers to specify (and for
programmers to understand) what does a given compiler conform to. If a
compiler claims successor support, is it successor definition from a
week ago, a month ago, some specific git version, or? It's much better
to make a branch of successor called "ML16 alpha", stabilize, and
release as "ML16", and do such a thing every year.

Lexical syntax for numeric literals.

This thread started as an SML Basis Library issue, but it probably should be here.

The question is where to allow underscores in numeric literals. The specification as written leads to the following regular-expressions for integer literals:

[+~-]?(0b|0B)[0-1]+(_[0-1]+)*
[+~-]?[0-9]+(_[0-9]+)*
[+~-]?(0x|0X)[0-9a-fA-F]+(_[0-9a-fA-F]+)*

The following table summarizes what other languages support (including what
is currently in the Successor ML Definition and what MLton and SML/NJ
currently implement).

Language	Leading	Trailing	Consecutive
Ada	no	no	no
Java	no	no	yes
OCaml	no	yes	yes
Rust	no	yes	yes
SuccML	no	no	no
MLton	no	no	yes
SML/NJ	no	no	no

Other languages, such as C#, F#, and Haskell do not allow underscores in numeric literals.

Cost Dynamics

It sure would be nice if the revised dynamics would include at least time costs, and perhaps also space costs, so that one can support algorithm analysis and make clear what parallelism is expected. For example, one might specify that "and" bindings are to be evaluated in parallel. Among many reasons to want this, it would highlight another advantage of having a formal specification of the language, and it would facilitate teaching.

Pattern guards and disjunctive patterns

I didn't see an issue for this, I hope it isn't retreading anything,

the summary of proposed changes contains both disjunctive patterns, and pattern guards. In the (very succinct!) paper ambiguous pattern variables by Scherer et al, from the ML workshop 2016, this combination lead to patterns whose interpretation probably won't match a users naive expectation.

In the case of Ocaml this lead to a new warning, I have yet to look at the proposed semantics for Successor ML to figure out if it is impacted, or if there is an ~~terrible solution such as evaluating the guard twice~~ which would work. As such, not really proposing anything, but thought it should be discussed.

Edit: Initially I had the thought that such cases could be resolved by expanding the pattern guard to evaluate to is_neutral n orelse is_neutral n' after alpha conversion, but this direction seems real misguided so I struck that out.

For - cycle?

What about including Ocaml-style for cycle:

for i = 1 to 18 do ... done (or end)

It is very handy when iterating through arrays or strings. Also, it does not
let one make common mistake with repeating calls in condition, like

for( i = 0; i < strlen(s); i++)

Polymorphic comparison

In SML the programmer gets for free an equality function for all equality types:

op= : ''a * ''a -> bool

This is great - it relieves the programmer from having to define
equality for all of user-defined types (and records involving such
types).

I suggest the Successor-ML adds polymorphic comparison, i.e. total ordering
for all equality types:

compare : ''a * ''a -> order

The lack of polymorphic comparison in SML often practically forces
programmers to use slow datastructures. Faced with the mundane task of
writing long but trivial comparison functions, one often resorts to
using a datastructure where only equality is required. E.g. one might
use a set based on a list instead of one based on a tree.

Example. The type Exp represents expressions:

datatype Exp = Unary of int | Binary of Exp * Exp | Ternary of Exp * Exp * Exp

Now we want to count how many times each expression occured. We can
use a finite map from expressions to their count. If we have the
polymorphic comparison at our disposal, we are good to go.

However without polymorphic comparison, we need to write a trivial,
yet very long and error-prone function:

fun compare_exp (Unary i, Unary j) = Int.compare (i,j)
  | compare_exp (Unary _, _) = LESS
  | compare_exp (_,Unary _) = GREATER
  | compare_exp (Binary(e1,e2), Binary(e3,e4)) = (case compare_exp (e1,e3) of
                                               EQUAL => compare_exp (e2,e4)
                                            | ord => ord)
  | compare_exp (Binary _, _) = LESS
  | compare_exp (_, Binary _) = GREATER
  | compare_exp ... = ... (* No, we're not done yet. *)

The polymorphic comparison should be easy to define and implement -
after all equality is already there, and it's very similar. One just
needs to define a standard order for datatypes and records.

I suggest datatypes are sorted in definition order. This will give the
programmer an easy way to control the sorting order. In our example
Unary < Binary < Ternary. If programmer wants to sort them for
instance alphabetically, all that needs to be done is swap the order
in the datatype definition to:

datatype Exp = Binary of Exp * Exp | Ternary of Exp * Exp * Exp | Unary of int

For records (including tuples) my suggestion is to sort first by #1,
then by #2, then by #a, then by #z (i.e. analogous to Int.compare for
numeric labels and String.compare for alphanumeric labels).

definitions and sequencing

I would say that one thing we've learned is that combining two aspects of a val binding into one is a mistake. On one hand val is used as a definition mechanism that induces polymorphism in accordance with the substitution principle. On the other hand val is a "monadic bind" that enforces sequencing of evaluation, which is particularly important when using effects. The two coincide when the rhs is a value, but we know that this simple-minded criterion is just that, especially in the presence of data abstraction. A classic example is to do with an abstract type of parsers, which are "in reality", functions, but abstractly are not, and this screws up the "value restriction" greatly.

Why not separate the two roles? We could have "val" for sequencing, and "def" for polymorphism. The semantics is clear, and never ambiguous, and you never have to fight the value restriction.

A related point is to change the syntax from "let dec in exp end" to "dcl dec in exp end". People hate "let val x = ...", which I find quite nice, but maybe "dcl" is better, especially now that we have "do" as a possible form of declaration, which I also find quite nice. "dcl" stands for "declare", and it's first argument is indeed a declaration.

`val rec` and nested match

I think it is sensible to disallow nested match in val rec, but I couldn't find such words in the definition. Am I missing something?

In HaMLet S, the following code results in a runtime error.

val rec (f with () = print (g "X")) = fn x => x
and (g with () = print (f "Y")) = fn y => y;

Pattern matching equality types

Standard ML has two branching mechanisms: pattern matching and the more traditional if-then-else. The former is generally considered superior for most purposes. For example:

fun foo nil = bar
  | foo (x :: xs) = qux (x, xs)

is regarded as cleaner code than:

fun foo xs = if null xs then bar else qux (hd x, tl x)

However, SML's existing pattern matching mechanism provides no support for equality testing. One cannot write, for example:

fun foo ((p,a) :: (p,b) :: (p,c) :: xs) = SOME whatever
  | foo _ = NONE

Instead, one has to write the much clumsier:

fun foo ((p,a) :: (q,b) :: (r,c) :: xs) =
    if p = q andalso q = r then SOME whatever else NONE
  | foo _ = NONE

Note that q and r serve no purpose other than to be tested for equality with p.

I propose that repeated variables be allowed in patterns, and that they be required to have eqtypes. As an additional benefit, the equality testing function wouldn't need to be a magical primitive anymore. Instead, it can be defined as a library function:

fun op= (x, x) = true
  | op= _ = false

I would also like to propose that the equality testing function be renamed to ==, since it is confusing to use the same token for definitions and equality testing. But that's a debate for another day.