Giter VIP home page Giter VIP logo

neut's Introduction

neut

neut is a dependently-typed programming language based on the Calculus of Constructions (CoC). The interesting point is that this language determines how to allocate/deallocate memory at compile-time, without extra annotations to the type system. In other words, neut in its source language is the ordinary lambda-calculus after all (imagine something like Haskell, OCaml, Idris, Coq, Agda, or Lean), and at the same time it handles memory without using, for example, explicit malloc/free, or garbage collection.

Theoretically, this is made possible by translating every type of the language into a function that can be used to copy/discard the terms of the type; in this language, a type is not erased, but translated into a computationally-meaningful term.

Practically, this means that you can write your program in the ordinary lambda-calculus without any extra restrictions or annotations, and at the same time are allowed to control how resources are used in the program. If you want an ordinary-lambda-calculus based programming language with a human-predictable semantics (including its memory management), this might be for you.

Here I briefly summarize the basic properties of neut before diving into details:

  • A compiled language
  • The output is LLVM IR / assembly / executable binary (the last two are via clang)
  • The type system is CoC + fix + int + float + enum - universe hierarchy (+ unsafe nop cast)
  • The evaluation strategy is call-by-value
  • Memory allocation/deallocation is statically determined at compile time

Table of Contents

Ideas

Memory Management Based on Linearity

Basics

Let’s see how neut manages resources. The following code prints the string “a” for the 3 times:

; download the core library
(ensure core/0.2.0.0
  "https://github.com/vekatze/neut-core/raw/master/release/0.2.0.0.tar.xz")

(include "core/0.2.0.0/core.neut")

(with identity.bind
  (let str "a")
  (let _ (string.print str))
  (let _ (string.print str))
  (string.print str))

The (with identity.bind (...)) is the same as the do-notation in Haskell or other languages, specialized to the identity monad.

By running neut build --emit llvm filename.neut, we obtain schematically the following LLVM IR:

declare i8* @free(i8*)
declare i8* @malloc(i64)
define i64 @main() {
  ; Repeat the following for the 3 times:
  ;   <memory allocation for the string>
  ;   <create the string "a" (i.e. write the character 'a' to the allocated memory)>
  ;   <write the string into stdout>
  ;   <memory deallocation for the string>
}

The resulting LLVM IR creates a string in memory, print it, and free it for the 3 times. The point here is that the resulting code creates 2 copies of the original string. This is because the variable str is used for the 3 times; The content of a variable is copied at the beginning of the scope to create n instances when the variable is used for the n times (n > 1). If the variable isn’t used at all (n < 1), the content of the variable is discarded (deallocated) at the end of the scope. If the variable is used exactly once, or linearly (n = 1), the content of the variable is used without any discarding/copying operation.

By translating the source calculus in the way sketched above, every variable is ensured to be used linearly, except for the ones in the functions that realize those copying/discarding operations. Thus, by checking that those copiers and discarders - or exponentializers - use resources properly, we can ensure that the target calculus of this translation handles memory properly. Then we check that those exponentializers are indeed sane - This is the basic storyline.

The way how a variable is copied/discarded is determined by the type of the variable. For example, if the type of a variable is the array type as in the example above, the copying operation is something that you would do in C to copy an array; memory allocation followed by value insertion. If the type is an immediate type like an integer type, the “copying” operation is the one that uses the original value for the cloned value. Every type is translated into a term that copies/discards the terms of the type, including the type of the types.

Types as Exponentializers

Let us go a little deeper. Consider the following schematic user input:

let str : string := "a";
{- A CODE THAT USES `str` for the 3 times -}

Intuitively, what the compiler does is to translate the term above into the following term:

let str : string := "a";
let (discard-string, copy-string) := TRANSLATE(string);
let str-copy-1 := copy-string str;
let str-copy-2 := copy-string str;
{- THE CODE THAT USES `str-copy-1`, `str-copy-2`, AND `str` LINEARLY -}

where the function copy-string copies given string without modifying the argument. discard-string discards given string.

Note that the second line extracts discard-string and copy-string from the tuple TRANSLATE(string). Every type X is conceptually translated in this manner; the pair of discard-X and copy-X. These exponentializers - something that allows us to create n copies of x from a single x - are used in its continuation so that every variable of this type (str in this example) is used linearly.

Incidentally, in the actual implementation, the result of TRANSLATE(string) is not a pair, but a function as in the pseudo-code below:

exp-string i e :=
  if i == 0
  then discard-string(e)
  else copy-string(e)

which is used in the following manner:

let str : string := "a";
let exp-string := TRANSLATE(string);
let str-copy-1 := exp-string(1, str);
let str-copy-2 := exp-string(1, str);
{- THE CODE THAT USES `str-copy-1`, `str-copy-2`, AND `str` LINEARLY -}

This alternative translation frees us from having to create a tuple every time when we translate a type. Thus, in the actual implementation, every type is translated into a closed function, which is then lowered to a pointer (1 word).

Also, note that the variable exp-string is a closed function. Because of that, we can set the content of the variable exp-string as the pointer to the closed function. This allows us to use the variable exp-string in non-linear manner without breaking the resource management system.

It would be worth noting here that these functions like exp-string are β-reduced (inlined) aggressively; These are ordinary functions that can be defined in the target language, after all.

Notes on Closures

You may be wondering now: “How can we copy/discard a closure? In ordinary closure conversion, a lambda-abstraction is translated into a pair that consists of (1) all the free variables in the abstraction, and (2) a pointer to an appropriately-arranged closed function. How can that tuple be copied/discarded just by using type information like i64 -> bool, which is seemingly useless here? How should we translate the type i64 -> bool?”

That is a valid question. The key to give the answer to this question is generalizing the concept of “all the free variables”. Consider the following term:

λ (a : type). λ (x : a). λ (y : i64). (x, y)

In ordinary closure conversion, the free variables of λ (y : i64). (x, y) is calculated to be [x] without making a fuss. Here, however, we generalize the concept so that we “trace” all the free variables included in the type of every free variable. In this case, for example, note that the type of x is a, which is again a free variable if it occurs in λ (y : i64). (...), and thus this a is also considered as a free variable. Since the type of a is type, which has no free variables, our tracing stops here, resulting a chain of the free variables [a : type, x : a].

Note that every result of this procedure is necessarily “closed”. That is, if the list [x1 : A1, ..., xn : An] is a result of this tracing process, then the set of free variables in A{i} is a subset of {x1, ..., x{i-1}}. In this way, we extract a closed chain from a lambda-abstraction.

Let us continue the example of λ (y : i64), (x, y). Using the closed chain we have just calculated, the compiler translates this lambda-abstraction conceptually as follows:

(∑ [a : type, x : a], (a, x), LABEL_OF_A_CLOSED_FUNCTION)

That is, a lambda-abstraction is translated into a tuple consists of (0) the type of its closed chain, (1) its closed chain, and (2) a pointer to an appropriately-arranged closed function. Now, remember that every type is translated into a term that copies/discards the terms of the type. Also remember that every type is translated into a function pointer, which can be copied/discarded without any malloc/free operations. Thus, we can generate a function that copies/discards a closure conceptually as follows:

exp-closure i closure = do
  -- extract the element of the closure, freeing the outer 3-word tuple
  let (typeOfChain, chain, label) := closure;
  if i == 0
  then do
    -- discard the chain using the type information
    let _ := typeOfChain(0, chain);
    -- note that both typeOfChain and label are immediate
    return ()
  else do
    -- copy the chain using the type information
    let chainCopy := typeOfChain(1, chain);
    -- create and return a clone
    return (typeOfChain, chainCopy, label)

Thus, we can translate every Π-type into the function exp-closure. Every Closure is copied/discarded in the same way, regardless of its actual details. So, information like i64 or bool in i64 -> bool is indeed useless here; It simply isn’t necessary since every closure knows how to copy/discard itself.

The remaining piece is how a type of a closed chain like ∑ [a : type, x : a] is translated. Suppose we have a term e of type ∑ [a : type, x : a]. Since we already know the type of e, we can eta-expand this term as follows:

let (a, x) := e in
(a, x)

Now, at this point, note that if we can copy both a : type and x : a, we can then copy e as follows:

let (a, x) := e in
let a-copy := {COPY_a} a in
let x-copy := {COPY_x} x in
(a-copy, x-copy)

Thus, thanks to eta-expansion, the problem of copying/discarding the terms of type ∑ [a : type, x : a] is reduced into the one of copying/discarding the terms of type a : type and x : a.

The actual copying function is constructed inductively as follows. The starting point is the following term:

let (a, x) := e in
(a, x)

Firstly we copy the rightmost element x - using its type a - so that x is used linearly:

let (a, x) := e in
let x-copy := a(1, x) in
(a, x-copy)

Then we copy the term a - using its type type - so that a is used linearly:

let (a, x) := e in
let a-copy := type(1, a) in
let x-copy := a-copy(1, x) in
(a, x-copy)

where the type is a term defined as follows:

type i a =
  if i == 0
  then ()
  else a

This is resource-safe since a type is translated into a function pointer. This creates a term that uses both a and x linearly.

Note that, by its construction, the first element of a closed chain doesn’t contain any free variables, and therefore can be copied without using any free variables.

In conclusion, the copying part of ∑ [a : type, x : a] is defined by the following term:

copy-sigma sig :=
  let (a, x) := sig in
  let a-copy := type(1, a) in
  let x-copy := a-copy(1, x) in
  (a, x-copy)

By using this term, we can now copy the tuple (a, x) in the closure (∑ [a : type, x : a], (a, x), LABEL). The discarding function is defined similarly. That is, we change the starting point to

let (a, x) := e in
()

and do the same procedure. Now we just have to construct the following term:

exp-sigma i sig :=
  if i == 0
  then discard-sigma sig
  else copy-sigma sig

and translate ∑ [a : type, x : a] into the exp-sigma above. In this way we can copy/discard a closure.

Incidentally, this 3-element representation of a closure is not new. Indeed, I was told in a GitHub issue that there exists a work that does a similar (the same?) thing, though I have not been able to read it yet. Also, with a quick search, I found a work in 1996 that does a similar thing. Thus I emphasize here that I do not claim any originality on this generalization of closure conversion.

Notes on Polymorphic Functions

You may be wondering: “What if a function is polymorphic? If the size of an argument is not fixed, how can that function copy the term?”

That is again a valid question, and here comes dependent-type. Firstly, remember that a polymorphic function in dependent-type theory is nothing but an ordinary function with an argument of type tau, where tau is the type of types. For example, the following is a polymorphic function that creates a pair of any type:

to-tuple : Π (a : tau, x : a). a * a
to-tuple (a : tau) (x : a) :=
  (x, x)

This function to-tuple is, for example, used as follows:

to-tuple i64 1           --  ~> (1, 1)
to-tuple bool bool.true  --  ~> (bool.true, bool.true)
to-tuple string "a"      --  ~> ("a", "a")

We can see that the type i64 is used in exactly the same way as 1. A type is nothing but an ordinary term of type tau. And these very terms i64, bool, and string in the example are translated into ordinary closed functions that copies/discards the terms of the types. The to-tuple function can therefore copy the resource x of type a conceptually as follows:

to-tuple :: Π (a : tau, x : a). a * a
to-tuple a x :=
  let x-copy := a(1, x) in
  (x-copy, x)

Thus the answer to the question is: Polymorphic functions can copy/discard its polymorphic argument since the type, which is guaranteed to be passed as an argument, contains information of how to copy/discard the terms of the type.

Summary So Far

  • A variable is copied/discarded so that the variable is used linearly
  • A type is lowered into a function pointer that copies/discards the terms of the type
  • Closures can be copied/discarded since they know how to copy/discard itself
  • Polymorphic function can copy/discard its polymorphic arguments thanks to the information provided by its type argument

This is the basic behavior of the neut’s resource management system. As you might already be aware, this naive copying/discarding can result in an inefficient object code. We often use a variable more than once, as in the example of str:

(with identity.bind
  (let str "a")
  (let _ (string.print str))
  (let _ (string.print str))
  (string.print str))

We can’t say the resulting LLVM IR of this code is efficient enough; We can’t ignore those redundant copy operations.

Fortunately, there is a workaround for this performance problem.

Manual Optimization via Borrowing

The point of the workaround is straightforward: If those copying/discarding operations result from using variables non-linearly, we simply have to use variables linearly. Let’s go back to the first example code:

(ensure core/0.2.0.0
  "https://github.com/vekatze/neut-core/raw/master/release/0.2.0.0.tar.xz")

(include "core/0.2.0.0/core.neut")

(with identity.bind
  (let str "a")
  (let _ (string.print str))
  (let _ (string.print str))
  (string.print str))

We would like to use the variable str linearly. To this end, we can request string.print to include the argument str in its return value. So, the type of string.print shouldn’t be something like string -> top - where the top is the unit type - but should be string -> string * top, where the A * B means the product type of A and B. More specifically, the implementation of string.print should be something like this:

string.print :: string -> string * top
string.print str = do
  {- print the string `str` -}
  return (str, unit)

With that definition of string.print, we can use the variable str linearly:

let str1 = "a";
let (str2, _) := string.print str1;
let (str3, _) := string.print str2;
let (str4, _) := string.print str3;
unit

Note that the variables str1, str2, and str3 are used exactly once, and str4 for the 0 time. Therefore, the copying operation doesn’t occur in the code above. Also, since the str4 is defined but not used, the str4 is discarded at the end of its scope.

Now we have seen that those redundant copying/discarding operations can be avoided by writing the code in the manner above. There still remains a problem: code cluttering. It would be much nicer to have more sophisticated notation of that code pattern. Towards that end, firstly note that we can use the same name for the variables str1, str2, str3, and str4 thanks to variable shadowing:

let str = "a";
let (str, _) := string.print str;
let (str, _) := string.print str;
let (str, _) := string.print str;
unit

Now, we just have to introduce a notation that translates:

let foo := string.print &str;

into:

let (str, foo) := string.print str;

With this notation, our running example is rewritten as follows:

let str = "a";
let _ := string.print &str;
let _ := string.print &str;
let _ := string.print &str;
unit

And this is the notation that is implemented in neut. Indeed, the following is a valid code of neut:

(ensure core/0.2.0.0
  "https://github.com/vekatze/neut-core/raw/master/release/0.2.0.0.tar.xz")

(include "core/0.2.0.0/core.neut")

(with identity.bind
  (let str "a")
  (let _ (string.print &str))
  (let _ (string.print &str))
  (let _ (string.print &str))
  top.unit) ; ~> top.unit

Or,

(ensure core/0.2.0.0
  "https://github.com/vekatze/neut-core/raw/master/release/0.2.0.0.tar.xz")

(include "core/0.2.0.0/core.neut")

(with identity.bind
  (let str "a")
  (let _ (string.print &str))
  (let _ (string.print &str))
  (string.print str)) ; ~> (unit, "a")

This notation is “borrowing” in neut. Note that borrowing in neut is nothing but a syntactic translation. Borrowing has nothing to do with, for example, the type system, or the operational semantics, of neut. Indeed, this syntactic translation is processed at the stage of parsing in the compiler.

Let’s see how the resulting LLVM IR changes. Is it faster now? We can compile the code above by running neut build --emit llvm filename.neut. The output is schematically as follows:

declare void @free(i8*)
declare i8* @malloc(i64)
define i64 @main() {
  ; <memory allocation for the string>
  ; <create the string "a" (i.e. write the character 'a' to the allocated memory)>
  ; <write the string into stdout for the three times>
  ; <memory deallocation for the string>
  ; <return 0>
}

The point here is that the string “a” is reused without copying, as expected.

In short: the resulting code is faster in that it is free from the redundant copying operations that we saw in the first example.

This is how neut controls resources efficiently, without modifying the type system of the source language.

Manual Optimization via Noetic Computation (New in 0.2.0.0)

Still, this is not enough. Suppose we have a term of type list a, and are trying to calculate its length. The naive implementation would be something like this:

-- length : (a : tau, xs : list a) -> i64
length (a : tau) (xs : (list a)) :=
  case xs of
    nil ->
      0
    cons y ys ->
      1 + length a ys

-- use the function `length`
some-function := do
  let xs := list.new i64 1 2 3
  let len := length i64 xs
  if len < 3
  then foo xs
  else bar xs

The code above calculates the desired result. However, we can see that the function some-function uses xs twice. This means that the list xs is cloned just to calculate its length. This behavior is far from satisfactory. On the other hand, if we rewrite length so that it also returns the original list (to use the optimization that we have just seen), then we need to write something like this:

; length : (a : tau, xs : list a) -> (list a) * i64
length (a : tau) (xs : (list a)) :=
  case xs of
    nil ->
      (nil i64, 0)
    cons y ys -> do
      let (tmp-list, ys-length) := length a ys
      ((cons i64 y tmp-list), 1 + ys-length)

In this implementation, the original list xs is totally destructed and reconstructed using nil and cons from the ground up. Again, this is not a satisfactory behavior.

neut’s solution to this problem is two-fold:

  1. create an “unconsumable” version of xs : list a and use it to avoid redundant copy
  2. find a pattern that allows us to use the unconsumable version of xs safely, and turn it into a syntax

Let me firstly explain the former part. Let’s call the “unconsumable” version of xs a “noema” of xs. The noema of xs - which is of type noema (list a) - is characterized as follows.

  • The memory representation of the value of a noema is the same as that of original xs.
  • The noema is copied/discarded exactly the same way as an integer.

For example, consider the following code (ignore the initialization part):

let xs : noema (list i64) := (INITIALIZATION_OF_NOEMA) in
let a := f xs in
let b := g xs in
top.unit

The code above is compiled into something like this:

let xs : noema (list i64) := (INITIALIZATION_OF_NOEMA) in
let xs-copy := copy-as i64 xs in
let a := f xs-copy in
let b := g xs in
top.unit

which is equivalent to:

let xs : noema (list i64) := (INITIALIZATION_OF_NOEMA) in
let xs-copy := xs in
let a := f xs-copy in
let b := g xs in
top.unit

We can compare the code above with the code for list i64:

let xs : list i64 := [1, 2, 3] in
let xs-copy := copy-as (list i64) xs in
let a := f xs-copy in
let b := g xs in
top.unit

Note that we can easily break the resource management system of neut if we can use this noema without any restrictions. For example, consider the following code:

let xs : noema (list i64) := (INITIALIZATION_OF_NOEMA) in
top.unit

Since the xs is discarded not as a list but as an integer, the pseudo-code above does not free the list that is bound to xs. In other words, the code above causes a memory leak. This kind of unsafe behavior will be dealed with in the latter part of this section; Here, let us firstly see how this noema can be utilized for optimization.

We need a way to use a noema. case-noetic is what we will use for this purpose. case-noetic is basically the read-only version of the the pattern matching operator case, and thus their uses are more or less the same:

let xs : list i64 := [1, 2, 3] in
case xs of
  nil ->
    true
  cons y ys ->
    false

let xs : noema (list i64) := (INITIALIZATION) in
case-noetic xs of
  nil ->
    true
  cons y ys ->
    false

The differences between case and case-noetic are the following three.

The first one: case is used for a term of an ordinary data-type (list i64), whereas case-noetic is used for a term of a noetic data-type (noema (list i64)).

The second one: case consumes the matched value (xs), whereas case-noetic does not; case-noetic just reads the value. More specifically, a term of type list i64 is represented as a tuple something like this: (TYPE_OF_INTERNAL_DATA, INTERNAL_DATA, CONSTRUCTOR_LABEL). case extracts the elements from this tuple, deallocates the tuple, and then continues computation using the obtained values (e.g. select correct branch according to CONSTRUCTOR_LABEL). case-noetic also extracts elements from this tuple, but does not deallocate the tuple. In this sense, case-noetic does not consume the matched value.

The third one: in case-noetic, the types of the newly-bounded variables in the patterns are wrapped with noema (...). For example, if we use case-noetic for a term of type noema (list i64), then the y and ys in cons y ys will be of type noema i64 and noema (list i64), respectively. This is required to prevent an internal piece of a noema from being deallocated. If we were to set the type of ys as list i64, the code below will deallocate the ys:

let xs : noema (list i64) := (INITIALIZATION) in
case-noetic xs of
  nil ->
    true
  cons y ys ->
    false

since ys is not used. By wrapping this ys with noema (...), we can ensure that a noema is always unconsumable.

If we have such case-noetic, then we can write more efficient length basically as follows:

-- length-internal : (tau, noema (list a)) -> i64
length-internal (a : tau) (xs : noema (list a)) :=
  case-noetic xs of
    nil ->
      0
    cons y ys ->
      length-internal a ys

-- length : (tau, list a) -> (list a) * i64
length (a : tau) (xs (list a)) :=
  let xs-noema := CREATE_NOEMA &xs in
  let len := length-internal a xs-noema in
  (xs, len)

We now need to find a way to create and use noemata safely.

Let us move on to the latter part. To achive safety, we will employ the idea of ST monad here.

We firstly declare an opaque type subject, and add a tag of type subject to a noema. In other words, for a term s : subject, we redefine noema a into noema s a. We also define a subject-tagged identity monad, and name it a noesis; A noesis s a is the same as identity a except for the additional argument s : subject.

We also define a term noesis.run : Π (a : tau). (Π (s : subject). noesis s a) -> a. This is defined as follows:

noesis.run a f =
  let dummy-subject = cast i64 subject 0 in
  let answer        = f dummy-subject in
  cast (noesis dummy-subject a) a answer

That is, noesis.run executes given f by supplying it a dummy argument. This noesis.run is something that can be compared to runST in ST monad.

Using these new words, we define a syntax with-subject s (x) computation as follows:

  with-subject s (x) computation
~>
  -- (the `?M`s below are meta-variables and are inferred by the compiler)
  noesis.run ?M $ \(s : subject) -> do
    let x = cast ?M (noema s ?M) x
    ans <- computation
    return (cast (noema s ?M) ?M x, ans)

That is, what with-subject s (x) computation does is:

  • cast the x into a noema, using the same name, shadowing the original x : a
  • do the computation under the condition s : subject, x : noema s a
  • return the result of the computation, pairing it with the original x : a, which can be obtained by uncasting x.

Using this with-subject, the noetic length can be fully implemented as follows:

length-internal (s : subject) (a : tau) (xs : noema s (list a)) :=
  case-noetic xs of
    nil ->
      0
    cons y ys ->
      length-internal s a ys

-- length : (a : tau, _ : list a) -> (list a) * i64
length (a : tau) (xs (list a)) :=
  with-subject s (xs)
    ((noesis.return s) i64 (length-internal s a xs))

where the noesis.return is the subject-tagged version of the return operation of the identity monad.

We can use noemata safely as long as we use it via with-subject. Here I used the word “safely” to mean that we can see the following two properties:

(1) The content of every noema will be deallocated later. In other words, no memory leak is possible. This can be ensured since the content is returned as a part of the return value of with-subject.

(2) If we can use a noetic variable, then the corresponding resource is always available. In other words, no use-after-free is caused by using a noetic variable. This can be seen by the following reasoning. Firstly, if a use of a noema causes a use-after-free, then the corresponding resource must be deallocated before the use of the noema. Since a use of a noema does not deallocate its content, if a use of a noema causes a use-after-free, then the original, non-noetic variable must be used to deallocate the resource, before the use of the noema. Since the non-noetic variable is shadowed inside with-subject, such situation can only be happen outside the with-subject. That is, such situation can only be realized by returning a noema as a result of with-subject. However, this cannot happen by the very nature of noesis.run. Remember the type of noesis.run: Π (a : tau). (Π (s : subject). noesis s a) -> a. For this noesis.run to return a noema, the a in the type of noesis.run must be something like noema s b. Thus, the following type must be well-formed: (Π (s : subject). noesis s (noema s b)) -> noema s b. However, this type has an unbounded variable s. Therefore, with-subject cannot return a noema. That’s why no use-after-free is possible.

This concludes the latter part. By the way, if you want to use the content of a noema, you can “copy” the content of the noema along its type. This is what noema.incarnate does. This function can be used to, for example, compute the sum of a list:

-- noema.incarnate : (s : subject, a : tau, x : noema s a) -> (noema s a) * a

-- sum-internal : (s : subject, xs : noema s (list i64)) -> i64
sum-internal (s : subject) (xs : noema s (list i64)) :=
  case-noetic xs of
    nil ->
      0
    cons y ys ->
      let value = noema.incarnate s i64 &y
      value + sum-internal s a ys

-- sum : (_ : list i64) -> (list i64) * i64
sum (xs (list i64)) :=
  with-subject s (xs)
    ((noesis.return s) i64 (sum-internal s a xs))

To sum up, we can avoid copying operations by combining the resource management system of neut and the idea of ST monad.

Incidentally, the region-flavored application of ST monad is not new. You can find more information by referring the introduction part of the work titled Monadic Regions.

Automatic Optimization via malloc/free Cancellation

neut’s static memory management enables not only the “manual” optimization we have just seen, but also another “automatic” optimization. Remember the first example:

(with identity.bind
  (let str "a")
  (let _ (string.print str))
  (let _ (string.print str))
  (let _ (string.print str))
  (i64 0))

and the output IR of this example code:

declare void @free(i8*)
declare i8* @write(i8*, i8*, i8*)
declare i8* @malloc(i64)
define i64 @main() {
  ; Repeat the following for the 3 times:
  ;   <memory allocation for the string>
  ;   <create the string "a" (i.e. write the character 'a' to the allocated memory)>
  ;   <write the string into stdout>
  ;   <memory deallocation for the string>
}

The code is already judged to be inefficient in that it allocates/deallocates memory unnecessarily. More specifically, it is inefficient in that it deallocates the memory that can actually be reused.

Now you might think: If the sizes of allocations/deallocations are known at compile-time, isn’t it possible to compare the sizes of them at compile-time and emit a code that reuses the allocated memory?

It is indeed possible. When the option --no-alloc-cancellation is not passed, the compiler translates code pieces something like this:

define TYPE @FUNCTION_NAME(...) {
  (...)
  ; <memory allocation for the string>
  ; <create the string "a" (i.e. write the character 'a' to the allocated memory)>
  ; <write the string into stdout>
  ; <memory deallocation for the string>    -- (*1)
  ; <memory allocation for the string>      -- (*2)
  ; <create the string "a" (i.e. write the character 'a' to the allocated memory)>
  ; <write the string into stdout>
  ; <memory deallocation for the string>
  (...)
}

into something like this:

define TYPE @FUNCTION_NAME(...) {
  (...)
  ; <memory allocation for the string>
  ; <create the string "a" (i.e. write the character 'a' to the allocated memory)>
  ; <write the string into stdout>
  ; <create the string "a" (i.e. write the character 'a' to the allocated memory)>
  ; <write the string into stdout>
  ; <memory deallocation for the string>
  (...)
}

In other words, the compiler can cancel the memory deallocation at (*1) and the allocation at (*2), reusing the allocated memory in its continuation. This is automatic malloc/free cancellation. By this fallback optimization, the compiler can emit somewhat more performant code even when a user wrote code in an inefficient way.

Note that the “create the string” parts are not optimized away from the resulting LLVM IR, in contrast to the one of borrowing:

define i64 @main() {
  ; <memory allocation for the string>
  ; <create the string "a" (i.e. write the character 'a' to the allocated memory)>
  ; <write the string into stdout>
  ; <write the string into stdout>
  ; <write the string into stdout>
  ; <memory deallocation for the string>
  ; <return 0>
}

Although the compiler can cancel memory allocations/deallocations, it cannot cancel their accompanying initialization processes (at least for now). If you do need performance, you need to write code in the linear/borrowing style.

Summary

  • neut statically determines malloc/free at compile-time via type information
  • The content of a variable is
    • discarded if and only if the variable isn’t used at all
    • untouched if and only if the variable is used exactly once (i.e. used linearly)
    • copied if and only if the variable is used more than once
  • Linearity tends to result in an efficient code
  • Non-linearity tends to result in an inefficient code
  • Borrowing can be used as a convenient syntactic tool when accomplishing linearity
  • The ST-monadic approach can be combined with neut’s resource management system for better performance
  • Redundant malloc/free can be reduced by automatic malloc/free cancellation
  • Borrowing-based, or “manually” optimized code is faster than cancellation-based, or “automatically” optimized code

Installation

The currently supported platform is: Linux (x64). macOS support is currently dropped since I do not have any macOS machine right now (I’m waiting for the 16-inch M1 MacBook Pro).

Make sure that you have already installed tar, curl, stack (>= 2.3.0) and clang (>= 11.0.0). Also make sure that you have ~/.local/bin in your $PATH.

Then, clone the repository and build it:

git clone https://github.com/vekatze/neut
cd neut
git checkout 0.2.0.0
stack build
# the following builds the project and tests its behavior
# it also checks memory sanity via the clang option "-fsanitize=memory"
stack test --test-arguments test/data --test-arguments test/compiler
# this installs the executable `neut` into `~/.local/bin`
stack install

To uninstall, you just have to remove the binary ~/.local/bin/neut and the directory ~/.local/share/neut.

As for editor support, you can currently try neut-mode and flycheck-neut if you’re using Emacs. The former package is for syntax highlighting, and the latter one for linting.

Language Overview

Basic Structure

You can find a detailed description of the syntax, the logic, and the semantics of neut in the succeeding sections. Those should suffice to read/write a program of neut. Having said that though, some might prefer learning from actual source code after taking a brief look at basic stuff. So here I introduce you some necessities that would be required to understand - or guess the meaning of - a program of neut.

Let’s start. neut consists of two languages; the meta language and the object language. The meta language, which is essentially an untyped lambda-calculus, can be understod as the macro language, whereas the object language can be seen as the ordinary programming language. The source code of neut is firstly parsed as the code of the meta-language, and then normalized, generating list of ASTs. After that, these resulting ASTs are processed as the code of the target language.

Next. Notes on meta-level statements. (0) A meta-level program of neut is a list of meta-level statements, processed one by one. (1) ensure and include are the ones that use codes written in other files. (2) use/unuse and section/end are the ones that handle namespace.

Next. Notes on meta-level terms. (0) a meta-level term is just a term in an untyped lambda calculus. (1) The difference between the ordinary untyped lambda calculus and neut’s meta-level language is that the latter has two special syntactic constructs: leaf and node. These are values that represent ASTs. For example, (node (leaf a) (node (leaf b) (leaf c))) is a value that represents the AST (a (b c)). (2) The content of a source file is recognized as a code in the meta-level language, and is normalized to obtain an AST. (2) You will find Lisp-like quotes and unquotes in a program written in neut. These are just a way to write leaf and node efficiently, and not a real syntactic construct. For example, =’(a (b c) ,x)= is a shorthand for (node (leaf a) (node (leaf b) (leaf c)) x), where x is a variable that is defined beforehand.

Next. Notes on programs. (0) A program of neut is a list of statements, processed one by one. (1) use/unuse and section/end are the ones that handle namespace. These are used exactly the same way as in the meta language.

Next. Notes on terms. (0) tau is the type of type. (1) Π is the universal quantification. Note that Π (x : A). B is the same as the arrow type A -> B if x ∉ freevar(B). Also note that a lambda-abstraction in neut is n-ary; (λ ((x A) (y B)) e) is not the same as (λ ((x A)) (λ ((y B)) e)). (2) If you want to create a tuple, you can use (sigma-introduction e1 ... en) or (tuple e1 ... en). You can destruct a tuple by (sigma-elimination (x1 ... xn) e cont). (3) You can write (question e) or ?x when you want the compiler to show the type of e or x. (4) You can write * to have the compiler infer the specified part. For example, assuming (define id ((a tau) (x a)) x), you can write (id * bool.true) instead of (id bool bool.true). (5) You will notice that (witness t e) is used here and there. This is a notation defined by (notation (witness t e) ((λ ((x t)) x) e)); A notation that tells the compiler that the term of e is t.

Next. Notes on primitives: (0) The following primitive types are available: i1, i2, i3, …, i64. These are the same as the corresponding integer types in LLVM. (1) You can also use f16, f32, and f64. These are LLVM’s half, float, and double, respectively. (2) You will soon come to want primitive instructions - like add, mul, or xor - that can operate on terms of these types. Constants for this purpose is embedded in neut, and can be used like, for example, (add-i64 1 2), (icmp-eq-i64 0 1), etc. You can find more information on them in this section; Basically, you can use the primitives of LLVM.

Next. Notes on the compiler subcommands. (0) You can build a program with neut build path/to/file.neut. (1) You can create an tar.xz archive of a project via neut archive path/to/dir. Then you can upload the archive to somewhere, allowing others to ensure and include it.

Now I think you are basically ready to, for example, start reading the files in the test directory, or the files in the core library, referring the sections below as necessary. After that you should know how to write lambdas, recursive functions, inductive types, tuples, etc.

That pretty much should do it. I hope you enjoy this language.

Syntax and Semantics of the Meta Language

Statements

define-macro, define-macro-variadic

define-macro / define-macro-variadic defines a term of the meta-level language.

Example

(define-macro foo (arg-1 arg-2 arg-3)
  `(pohe ,arg-1 ,arg-2 ,arg-3))

(dry-expand `symbol-1 `symbol-2 `symbol-3)
; ~> `(pohe symbol-1 symbol-2 symbol-3)

(define-macro-variadic foo-variadic (arg-1 rest)
  `(pohe ,arg-1 rest))

(dry-expand (foo-varidaic `symbol-1 `symbol-2 `symbol-3))
; ~> `(pohe symbol-1 (symbol-2 symbol-3))

Syntax

(define-macro LEAF TREE)
(define-macro LEAF (LEAF ... LEAF) TREE)
(define-macro-variadic LEAF TREE)
(define-macro-variadic LEAF (LEAF ... LEAF) TREE)

Semantics

(define-macro x e) inserts the correspondence x ~> e in the environment.

(define-macro f (x1 ... xn) e) inserts the correspondence f ~> fix (x1, ..., xn). e in the environment.

(define-macro-variadic f (x1 ... xn) e) inserts the correspondence f ~> fix-variadic (x1, ..., xn). e in the environment. This correspondence is used later when the compiler tries to reduce the variable f.

These correspondences are used later when the compiler tries to reduce the variables.

define-prefix, remove-prefix

define-prefix defines an alias for an existing namespace. remove-prefix removes an alias that is defined by define-prefix.

Example

(define-macro foo.bar.pohe `a)

(dry-expand foo.bar.pohe) ; ~> `a
(dry-expand Q.pohe)       ; ~> undefined meta-variable: Q.pohe

(define-prefix Q foo.bar)

(dry-expand Q.pohe)       ; ~> `a

(define-prefix R Q)

(dry-expand R.pohe)       ; ~> `a

(remove-prefix R Q)

(dry-expand R.pohe)       ; ~> undefined meta-variable: R.pohe

(remove-prefix Q foo.bar)

(dry-expand Q.pohe)       ; ~> undefined meta-variable: Q.pohe

Syntax

(define-prefix LEAF LEAF)
(remove-prefix LEAF LEAF LEAF)

Semantics

(define-prefix new-prefix old-prefix) registers a new prefix correpondence new-prefix ~> old-prefix. This information is used for namespace resolution.

(remove-prefix new-prefix old-prefix) removes given prefix correspondence.

dry-expand

dry-expand outputs the result of meta-level reduction for the given term to the stdout.

Example

(dry-expand ((lambda-meta (x) `(,x ,x)) `foo))
; ~> `(foo foo)

Syntax

(dry-expand TREE)

Semantics

Semantically, dry-expand does nothing.

Notes

This statement is for debugging purpose; We can use this statement to see how a macro is expanded.

ensure

ensure fetches the content of the specified URL for later use.

Example

(ensure core/0.2.0.0
  "https://github.com/vekatze/neut-core/raw/master/release/0.2.0.0.tar.xz")

(include "core/0.2.0.0/core.neut")

Syntax

(ensure LEAF STRING)

Semantics

(ensure path URL) fetches the content of the specified URL, extracts the content into ~/.local/share/neut/NEUT_VERSION/path, assuming that the format of the archive is tar.xz. The path must be a valid path string. Every path separator in path is treated “literally”. For example, if the path is some-library/0.2.0.0, the content of the archive is extracted into ~/.local/share/neut/NEUT_VERSION/some-library/0.2.0.0.

If the target directory of archive extraction already exists, ensure does nothing.

This statement is intended to be used in harmony with include.

Archives specified in ensure is expected to be the ones created via archive.

include

include “pastes” the content of the specified file.

Example

(include "core/0.2.0.0/free.neut")

(include "./relative/path/from/the/dir/path/of/this/file.neut")

Syntax

(include STRING)

Semantics

{CODE_1}

(include STRING)

{CODE_2}

~>

{CODE_1}

{THE_CONTENT_OF_THE_FILE_SPECIFIED_BY_THE_STRING}

{CODE_2}

Notes

(1) If the first character of the path is dot (“.”), the path is interpreted as a relative one. That is, the path is calculated using the current file’s directory as the base path. Otherwise, the base path is set to be the library path (i.e. ~/.local/share/neut/NEUT_VERSION/library).

(2) If the file is already included, include does nothing.

(3) When including a file, the prefix environment must be empty.

(4) Cyclic inclusion is invalid.

introspect

introspect introspects the state of the compiler and selects statements by those information.

Example

(introspect OS
  (linux
    (include library "constant/linux.neut"))
  (darwin
    (include library "constant/darwin.neut")))

Syntax

(introspect LEAF (LEAF TREE ... TREE) ... (LEAF TREE ... TREE))

Semantics

(introspect VAR
  (VAR-1 stmt-1-1 ... stmt-1-n{1})
  ...
  (VAR-m stmt-m-1 ... stmt-m-n{m}))

~>

(stmt-i-1)
...
(stmt-i-n)

[where VAR == VAR-i]

If the corresponding value is not found in the clause list, this statement does nothing.

The var in (introspect var (...)) must be a valid compile-time variable. The valid compile-time variables and its possible values are currently as in the table below:

compile-time variablepossible values
OSlinux, darwin
architecturex86_64, aarch64, (etc.)

section/end

section / end automatically adds the specified prefix to the meta-variables defined by define-macro / define-macro-variadic.

Example

(section pohe)

(define-macro foo `a)

(section qux)

(define-macro bar `b)

(dry-reduce foo)       ; ~> `a
(dry-reduce pohe.foo)  ; ~> `a

(dry-reduce bar)          ; ~> 20
(dry-reduce qux.bar)      ; ~> 20
(dry-reduce pohe.qux.bar) ; ~> 20

(end qux)

(dry-reduce foo)       ; ~> `a
(dry-reduce pohe.foo)  ; ~> `a

(dry-reduce bar)          ; ~> undefined variable
(dry-reduce qux.bar)      ; ~> 20
(dry-reduce pohe.qux.bar) ; ~> 20

(end pohe)

(dry-reduce foo)       ; ~> undefined variable
(dry-reduce pohe.foo)  ; ~> `a

(dry-reduce bar)          ; ~> undefined variable
(dry-reduce qux.bar)      ; ~> undefined variable
(dry-reduce pohe.qux.bar) ; ~> 20

Syntax

(section LEAF)

(end LEAF)

Semantics

The list of statement

(section FOO)
(define-macro x1 e1)
...
(define-macro xn en)
(end FOO)

is equivalent to:

(use FOO)
(define-macro FOO.x1 e1)
...
(define-macro FOO.xn en)
(unuse FOO)

In other words, the section - end statement

  • inserts use / unuse at the beginning and the end of the section
  • adds the name of the section as a prefix of the variables defined by define-macro / define-macro-variadic
  • keeps all the other statements in the section intact

Each section must be paired with an end with the corresponding name.

statement

statement integrates multiple statements into one.

Example

(statement
  (define-macro foo `a)
  (dry-expand foo)) ; ~> `a

Syntax

(statement TREE ... TREE)

Semantics

(statement s1 ... sn)

is equivalent to the following code:

s1
...
sn

use/unuse

use inserts a prefix to the prefix environment. unuse removes a prefix from the prefix environment.

Example

(define-macro foo.bar.buz `a)

(dry-expand buz)         ; ~> undefined meta-variable: `buz`
(dry-expand bar.buz)     ; ~> undefined meta-variable: `bar.buz`
(dry-expand foo.bar.buz) ; ~> `a

(use foo)

(dry-expand buz)         ; ~> undefined meta-variable: `buz`
(dry-expand bar.buz)     ; ~> `a
(dry-expand foo.bar.buz) ; ~> `a

(use foo.bar)

(dry-expand buz)         ; ~> `a
(dry-expand bar.buz)     ; ~> `a
(dry-expand foo.bar.buz) ; ~> `a

(unuse foo)

(dry-expand buz)         ; ~> `a
(dry-expand bar.buz)     ; ~> undefined variable: `bar.buz`
(dry-expand foo.bar.buz) ; ~> `a

Syntax

(use LEAF)

(unuse LEAF)

Semantics

When parsed, the statement (use PREFIX) inserts PREFIX at the head of the prefix environment, which is a list of prefixes.

When parsed, the statement (unuse PREFIX) removes PREFIX from the prefix environment. If the PREFIX is not contained in the environment, the unuse statement does nothing.

Terms

variable

Syntax

LEAF

Semantics

x ~> e (if the correspondence x ~> e is registered in the environment)

Note

If a leaf is not parsed into any other syntactic construct, the leaf is regarded as a variable.

The name of a variable is interpreted in relation with the prefix environment. For example, if the prefix environment is ["foo", "bar", "buz"], the name qux is interpreted in the following way:

  1. Look up a bound variable named qux in current scope.
  2. If not found, look up a bound variable named foo.qux in current scope.
  3. If not found, look up a bound variable named bar.qux in current scope.
  4. If not found, look up a bound variable named buz.qux in current scope.
  5. If not found, report the error: “undefined variable”.

implication

Example

((lambda-meta (x y) `(,x ,y)) `foo `bar) ; ~> `(foo bar)

((lambda-meta (f) (f `foo)) (lambda-meta (x) `(,x ,x ,x))) ; ~> `(foo foo foo)

(fix-meta self (xs)
  (if (is-nil xs)
    (node)
    ()))

Syntax

(lambda-meta (LEAF ... LEAF) TREE)
(lambda-meta-variadic (LEAF ... LEAF) TREE)
(fix-meta LEAF (LEAF ... LEAF) TREE)
(fix-meta-variadic LEAF (LEAF ... LEAF) TREE)

Semantics

((lambda-meta (x1 ... xn) e) e1 ... en)
~> e {x1 := e1, ..., xn := en}

((lambda-meta-variadic (x1 ... xn) e) e1 ... en ... em)
~> e {x1 := e1, ..., x{n-1} := e{n-1}, xn := (en ... em)}

((fix-meta self (x1 ... xn) e) e1 ... en)
~> e {x1 := e1, ..., xn := en, self := (fix-meta self (x1 ... xn) e)}

((fix-meta-variadic self (x1 ... xn) e) e1 ... en ... em)
~> e {x1 := e1, ..., x{n-1} := e{n-1}, xn := (en ... em), self := (fix-meta-variadic self (x1 ... xn) e)}

if

Example

(use meta.node)

(define-macro map-meta (f xs)
  (if-meta (is-nil xs)
    xs
    (cons (f (head xs)) (map-meta f (tail xs)))))

(map-meta (lambda-meta (x) `(,x ,x)) `(a b c d))
; ~> `((a a) (b b) (c c) (d d))

Syntax

(if-meta TREE TREE TREE)

Semantics

(if-meta e on-true on-false) firstly reduces the e into a value. If the e is reduced into (node) (i.e. `(), or nil), then this if-meta is reduced into on-false, and reduction for this term follows. Otherwise, this if-meta is reduced into on-true, and reduction for this term follows.

constant

Example

(dry-expand (meta.node.head `(a b c d))) ; ~> `a
(dry-expand (meta.node.tail `(a b c d))) ; ~> `(b c d)
(dry-expand (meta.leaf.new-symbol `foo)) ; ~> foo;147
(dry-expand (meta.node.replicate 3 `foo)) ; ~> `(foo foo foo)
(dry-expand (meta.node.reverse `(a b c d))) ; ~> `(d c b a)

Syntax

LEAF

Semantics

Every constant has its own semantics. For example, meta.node.length returns the length of given node:

(meta.node.length `(a b c d e)) ~> 5

The declarations of all the constants are found in src/Data/MetaTerm.hs. Their behaviors are defined in the function reduceConstApp, which can be found in src/Reduce/MetaTerm.hs.

integer

Example

(dry-expand (meta.int.add 3 4)) ; ~> 7
(dry-expand (if-meta (meta.int.lt 0 10) `a `b)) ; ~> `a

Syntax

LEAF

Semantics

Integers do not have reduction rule. These are intended to be used in combination with constants like meta.int.add.

leaf/node

Example

(dry-expand (leaf a)) ; ~> `a
(dry-expand `a)       ; ~> `a

(dry-expand (node (leaf a) (node) (leaf b))) ; ~> `(a () b)
(dry-expand `(a () b))                       ; ~> `(a () b)

Syntax

(leaf LEAF)
(node TREE ... TREE)

Semantics

If e1 ~> v1, …, en ~> vn, then (node e1 ... en) ~> (node v1 ... vn). These reductions occur from left to right.

Notes on Quote, Unquote and Splice

We can use lisp-like quote and unquote to achive concise notation.

We can quote a term e by writing (quote e). We can also write it as `e.

In a quotation, a symbol is regarded as a leaf. For example, `a is equivalent to (leaf a). Also, in a quotation, a term of the form (e1 ... en) is regarded as a node. For example, `(a b c) is regarded as (node (leaf a) (leaf b) (leaf c)).

We can cancel the effect of a quotation by using an unquote. Unquotation of e is represented by (unquote e) or =,e=. This cancels the effect of the surrounding quotation. For example, `(a b ,c) is parsed as (node (leaf a) (leaf b) c). Thus, if the variable c is not defined, the term `(a b ,c) causes the error of undefined variable.

You can compare this with the string interpolation in other languages:

"hello, #{my-name}" -- string interpolation
`(hello ,x)         -- quote/unquote

The final one is splicing. I beleive that this one is something that can be best explained by example. Let us see one.

`(foo bar @(a (b c) d) buz) ; ~> `(foo bar a (b c) d buz)

That is, splicing is the operation that “lifts” all the element of a child tree of a node to its parent.

Syntax and Semantics of the Object Language

Statements

A program of neut is a list of statements. Each statement is one of the following statements described in this section.

Note that the examples codes below usually assumes that the core library is included.

enum

define-enum defines a new enum-type and its values.

Example

(define-enum choice left right) ; this defines choice : tau, choice.left : choice, and choice.right : choice

(define x choice.left)

(i64.print
  (enum-elimination x
    (choice.left
      (i64 1))
     choice.right
      (i64 2))) ; ~> 1

(i64.print (unsafe.cast choice i64 choice.left)) ; ~> 0

(i64.print (unsafe.cast choice i64 choice.right)) ; ~> 1

(define-enum foo
  (a 100)
  b
  (c 20)
  d
  e
  (f 103))

(i64.print (unsafe.cast foo i64 foo.a)) ; ~> 100
(i64.print (unsafe.cast foo i64 foo.b)) ; ~> 101
(i64.print (unsafe.cast foo i64 foo.c)) ; ~> 20
(i64.print (unsafe.cast foo i64 foo.d)) ; ~> 21
(i64.print (unsafe.cast foo i64 foo.e)) ; ~> 22
(i64.print (unsafe.cast foo i64 foo.f)) ; ~> 103

Syntax

(define-enum LEAF LEAF_INT ... LEAF_INT)

LEAF_INT := LEAF | (LEAF INT)

Semantics

(define-enum x a1 ... an) updates the state of the compiler so that the specified enum-type x : tau and the enum-values x.a1, ..., x.an : x can be used in its continuation.

Every enum-value has its internal i64 value (discriminant). Those discriminant values can be extracted by using unsafe.cast, though usually not recommended.

Discriminant value starts from 0 by default, and increments one by one. The “current” value of this process can be modified by writing, e.g. (enum foo a (b 100) c d). In this example, the discriminant value of c is set to be 101.

All the discriminant values of an enum-type must be distinct.

define

define defines a new top-level variable.

Example

(define foo (i64 10)) ; define a variable `foo` to be `10`

(i64.print foo) ; ~> 10 (this is equivalent to `(let _ (i64.print foo))`)

; ordinary definition (i.e. 1-mutual definition)
(define fact ((x i64))
  (if (icmp-sle-i64 x 0) ; compare two `i64`s as signed integers
    1
    (mul-i64 x (fact (sub-i64 x 1)))))

(i64.print (fact foo)) ; ~> 3628800 (= 10!)

; mutual recursion can be realized via the usual way:
(define even-f ((f (hom i64 bool)) (n i64))
  (if (icmp-eq-i64 n 0)
    true
    (f (sub-i64 n 1))))

(define odd ((n i64))
  (if (icmp-eq-i64 n 0)
    false
    (even-f odd (sub-i64 n 1))))

(define even ((n i64))
  (even-f odd n))

(i64.print
  (if (even 10)
    100
    1000)) ; ~> 100

Syntax

(define LEAF_PLUS TREE)
(define LEAF_PLUS (LEAF_PLUS ... LEAF_PLUS) TREE)

LEAF_PLUS ::= LEAF | (LEAF TREE)

Semantics

(define f (xt-1 ... xt-n) e) is equivalent to (define f (fix f (xt-1 ... xt-n) e)).

(define x e) checks the type of the term e, then registers the correspondence x ~> e to the environment of the compiler. After the definition, using the variable x is the same as writing e in place of x.

Note

Note that define does not evaluate the term e. For example, consider the following code.

(define foo
  (string.print-literal "hello!"))

(reduce
  (let _ foo)
  (let _ foo)
  top.unit)

The code prints hello not once but twice, since the define does not reduce the body of the variable.

define-prefix, remove-prefix

The behaviors of define-prefix and remove-prefix are essentially the same as the ones of the meta language.

reduce

reduce reduces given term.

Example

(reduce
  (with identity.bind
    (let x (i64 1))
    (let y (i64 2))
    (add-i64 x y))) ; ~> 3

(reduce
  (string.print-literal "hello, world!\n")) ; ~> (prints "hello, world!\n")

Syntax

(reduce TREE)

Semantics

(reduce e1 ... en) is equivalent to (reduce (with identity.bind e1 ... en)) when n > 1.

(reduce e) reduces - evaluates - the term e.

use/unuse

The behaviors of use and unuse are essentially the same as the ones of the meta language.

section/end

The behaviors of section and end are essentially the same as the ones of the meta language.

Terms

tau

tau is the type of types.

Example

(define id ((a tau) (x a)) x)

(id i64 10)

Syntax

tau

Inference Rule

-------------------(empty)
well-formed(EMPTY)


well-formed(Γ)
--------------- (tau)
Γ |- tau : tau

Semantics

tau doesn’t have any operational semantics.

variable

Example

(define x (i64 10))

(i64.print x) ; ~> 10

(define _ (i64 20)) ; anonymous variable

Syntax

LEAF

Inference Rule

    Γ |- A : tau
------------------------- (ext)
well-formed(Γ, x : A)


well-formed(Γ)  (x : A) ∈ Γ
---------------------------- (var)
       Γ |- x : A

Semantics

A variable doesn’t have any operational semantics by itself.

Notes

If a leaf is not parsed into any other syntactic construct, the leaf x is regarded as a variable.

The name of a variable is interpreted in relation with the prefix environment. For example, if the prefix environment is ["foo", "bar", "buz"], the name qux is interpreted in the following way:

  1. Look up a bound variable named qux in current scope.
  2. If not found, look up a bound variable named foo.qux in current scope.
  3. If not found, look up a bound variable named bar.qux in current scope.
  4. If not found, look up a bound variable named buz.qux in current scope.
  5. If not found, report the error: “undefined variable”.

Π

Π is the universal quantification.

Example

; nullary Π-introduction
(define f1
  (lambda () (i64 1))) ; 'lambda' can be used instead of 'Π-introduction' (we just have to write `(define-macro lambda Π-introduction)`)

; unary Π-introduction
(define f2
  (Π-introduction ((x i64)) x))

; Π-elimination
(Π-elimination i64.print (Π-elimination f2 2))

; Π-elimination with the familar (or, implicit) syntax
(i64.print (f2 2))

; binary Π-introduction
(define f3
  (λ ((x i64)   ; an argument with type annotation
      y)        ; an argument without type annotation
    (i64.add x y)))

(i64.print (f3 1 2))

; fix
(define fact
  (Π-introduction-fix self ((x i64))
    (if (icmp-sle-i64 x 0) ; i.e. if x <= 0
      1
      (mul-i64 x (self (sub-i64 x 1))))))

Syntax

(Π (LEAF_PLUS*) B)
(Π-introduction (LEAF_PLUS*) e)
(Π-introduction-fix LEAF_PLUS (LEAF_PLUS*) e)
(Π-elimination TREE+)
(TREE+)

LEAF_PLUS ::= LEAF | (LEAF TREE)

Inference Rule

Γ |- A1 : tau    Γ, x1 : A1 |- A2 : tau    (...)    Γ, x1 : A1, ..., xn : An |- B : tau
---------------------------------------------------------------------------------------- (Π)
                    Γ |- (Π ((x1 A1) ... (xn An)) B) : tau


             Γ, x1 : A1, ..., xn : An |- e : B
------------------------------------------------------------------------------- (Π-introduction)
 Γ |- (Π-introduction ((x1 A1) ... (xn An)) e) : (Π ((x1 A1) ... (xn An)) B)


       Γ, f : (Π ((x1 A1) ... (xn An)) B), x1 : A1, ..., xn : An |- e : B
--------------------------------------------------------------------------------- (Π-introduction-fix)
Γ |- (Π-introduction-fix f ((x1 A1) ... (xn An)) e) : (Π ((x1 A1) ... (xn An)) B)


Γ |- e : (Π ((x1 A1) ... (xn An)) B)   Γ |- e1 : A1   (...)   Γ |- en : An {xi := ei}
-------------------------------------------------------------------------------------- (Π-elimination)
              Γ |- (Π-elimination e e1 ... en) : B {xi := ei}

Semantics

(Π-elimination e e1 ... en)
~> (Π-elimination v v1 ... vn)  [i.e. reduce e and ei into the values v and vi, from left to right]

(Π-elimination (Π-introduction ((x1 A1) ... (xn An)) e) v1 ... vn)
~> e {x1 := v1, ..., xn := vn}

(Π-elimination (Π-introduction-fix self ((x1 A1) ... (xn An)) e) v1 ... vn)
~> e {x1 := v1,
      ...,
      xn := vn,
      self := (Π-introduction-fix self ((x1 A1) ... (xn An)) e)}

Notes

If a tree (e e1 ... en) is not parsed into any other terms, the tree is regarded as (Π-elimination e e1 ... en).

If the name of an argument of a Π-introduction is _, the compiler automatically generates a fresh name so that the variable cannot be used in its scope.

Note that the arguments of a lambda-abstraction is generalized from unary to n-ary. This enables the compiler to emit more performant code when a lambda-abstraction receives multiple arguments; Without that generalization, the arguments must be represented as a tuple, discarding the possibility to pass the arguments of a function using multiple registers.

Incidentally, I personally recommend you to use the witness notation above when defining a function to write the resulting type of the function explicitly. For example, the code

(define fact ((x i64))
  (witness i64
    (if (icmp-sle-i64 x 0)
      1
      (mul-i64 x (fact (sub-i64 x 1))))))

is preferred to:

(define fact ((x i64))
  (if (icmp-sle-i64 x 0)
    1
    (mul-i64 x (fact (sub-i64 x 1)))))

As for Π-introduction-fix, every tail call is optimized into a loop.

Also, remember that the logic of neut doesn’t adopt the universe hierarchy, and thus inconsistent. This means that the Z combinator can be written in the source language as an ordinary term. In other words, from the viewpoint of expressive power, Π-introduction-fix is simply redundant. The existence of fix is just for optimization purpose.

constant

constant is for external constants.

Example

((constant add-i64) 1 3) ; ~> 4
(add-i64 1 3)            ; ~> 4

Syntax

(constant LEAF)
LEAF

Inference Rule

Γ |- A : tau          {`c` is declared to be a constant of type `A`}
--------------------------------------------------------------------- (constant)
         Γ |- (constant c) : A

Semantics

The constant rule doesn’t have any operational semantics by itself; Each constant has its own dedicated semantics.

Notes

If a leaf is declared to be a constant using the constant statement beforehand, the leaf is interpreted as (constant LEAF).

int

i{n} is the integer type in LLVM.

Example

(add-i64 (i64 1) 2)
(i32 10)
23456789
(mul-i2 (i2 100) 3)

Syntax

; the integer type i{n}
i{n} [where n is one of 1, 2, ..., 64]

; an integer of type i{n}
(i{n} LEAF)  [where n is one of 1, 2, ..., 64]

; an integer without explicit type info
LEAF [where this LEAF can be parsed as integer]

Inference Rule

well-formed(Γ)     {`l` is an integer}     {i{n} is a valid integer type}
--------------------------------------------------------------------------- (integer)
                  Γ |- (i{n} l) : i{n}

Semantics

The terms of an integer type don’t have any operational semantics by themselves.

Notes

The int type in neut is the same as the one of LLVM, restricted into i1, i2, …, i64.

Every integer of type i{n} is interpreted modulo 2^n, just as in the same way of LLVM. For example, (i2 10) is the same as (i2 6), (i2 2), (i2 -2), or (i2 -6), since all of these are equivalent modulo 2^2.

An integer without explicit type information is overloaded; it’s type is firstly set to be unknown, and then inferred.

float

f{n} is the float type in LLVM.

Example

(f16 3.8)
(f32 9.22888)
(f64 1.23456789)
(fadd-f64 1.23456 (f64 7.89))

Syntax

; the float type f{n}
f{n}   [where n is one of 16, 32, 64]

; a float of type f{n}
(f{n} LEAF) [where n is one of 16, 32, 64]

LEAF [where this LEAF can be parsed as float]

Inference Rule

well-formed(Γ)     {`l` is a float}      {f{n} is a valid float type}
------------------------------------------------------------------------ (float)
                  Γ |- l : f{n}

Semantics

The terms of a float type don’t have any operational semantics by themselves.

Notes

The float type in neut is the same as the one of LLVM. Specifically, LLVM’s half corresponds to neut’s f16, float to f32, and double to f64.

An float without explicit type information is overloaded; it’s type is firstly set to be unknown, and then inferred.

enum

enum is the enumeration type.

Example

See the example in the section about the enum statement.

Syntax

(enum LEAF)

(enum-introduction LEAF)
LEAF

(enum-elimination TREE
  (ENUM_CASE TREE)
  ...
  (ENUM_CASE TREE))

ENUM_CASE ::= LEAF | default

Inference Rule

well-formed(Γ)  {`E` is declared to be an enum type}
----------------------------------------------------- (enum)
                 Γ |- (enum E) : tau


well-formed(Γ)   {`l` is a value of enum-type `(enum E)`}
----------------------------------------------------------- (enum-introduction)
       Γ |- (enum-introduction l) : (enum E)


Γ |- e : (enum E)       (Γ |- e_l : A) for all l ∈ E = {l1, ..., ln}
---------------------------------------------------------------------- (enum-elimination)
       Γ |- (enum-elimination e (l1 e1) ... (ln en)) : A

Semantics

(enum-elimination (enum-introduction c)
  (c1 e1)
  ...
  (cn en))
~> ei [where c = ci]

(enum-elimination (enum-introduction c)
  (c1 e1)
  ...
  (cn en)
  (default e)
  ...)
~> e [where e != e1, ..., en]

Notes

The cases of an enum-elimination must be exhaustive.

question

question requests the compiler to show the type of a term.

Example

(define x top.unit)

(question x)

Syntax

(question TREE)

?TREE

Inference Rule

     Γ |- e : A
----------------------- (question)
Γ |- (question e) : A

Semantics

(question e)
~> e

Notes

The type of a term wrapped by question is reported by the compiler. This might be useful when used in harmony with a linter like flycheck.

derangement

A derangement is something that can be used to utilize an extra-linguistic feature, such as syscall.

Example

; call an external function "malloc"
(define allocate ((size i64))
  (witness unsafe.pointer
    (derangement (external malloc) size)))

; use LLVM store instruction
(define store-i8 ((value i8) (ptr unsafe.pointer))
  (witness top
    (derangement (store i8) ptr value)))

; use nop to define the unsafe cast
(define unsafe.cast ((A tau) (B tau) (x A))
  (witness B
    (derangement nop x)))

; using a syscall (write)
(define write ((out file-descriptor) (buf unsafe.pointer) (len i64))
  (witness i64
    (derangement (syscall 1) out buf len)))

Syntax

(derangement DERANGEMENT-KIND TREE ... TREE)

DERANGEMENT_KIND :=
  (syscall INT) |
  (external LEAF) |
  (load LOW_TYPE) |
  (store LOW_TYPE) |
  (create-array LOW_TYPE) |
  (create-struct LOW_TYPE ... LOW_TYPE) |
  nop

LOW_TYPE :=
  i{n} |
  f{n} |
  (pointer LOW_TYPE) |
  (array INT LOW_TYPE) |
  (struct LOW_TYPE ... LOW_TYPE)

Inference Rule

Γ |- e1 : A1   ...   Γ |- en : An   (k is a derengemnt-kind)    Γ |- B : Type
------------------------------------------------------------------------------- (derangement)
             Γ |- (derangement k e1 ... en) : B

Semantics

(derangement (syscall k) e1 ... en) calls the syscall k with e1, …, en as its argument.

(derangement (external f) e1 ... en) calls the external function f with e1, …, en as its argument.

(derangement (load t) e) loads the value from the pointer specified by e, assuming that the value is of type t.

(derangement (store t) e v) stores the value v to the pointer specified by e, assuming that the value is of type t.

(derangement (create-array t) e1 ... en) creates an array [e1, ..., en], assuming that the values are of type t. The behavior of create-array can be illustrated as follows:

let p := malloc(size-of(t) * n) in
let _ := store(p + (size-of(t) * 0), e1) in
let _ := store(p + (size-of(t) * 1), e2) in
...
let _ := store(p + (size-of(t) * (n-1)), en) in
p

(derangement (create-struct t1 ... tn) e1 ... en) creates a struct {t1, ..., tn}, assuming that the value ei is of type ti.

(derangement nop e) simply returns the e.

Note

The arguments and the resulting type of a derangement do not have any restrictions, and thus a derangement is inherently unsafe. Derangements are intended to be used with wrapper functions.

The resulting type must be annotated using something like witness since there is no way to determine it by the derangement itself.

Incidentally, the behavior of unsafe.pointer from the viewpoint of resource management is the same as an integer.

Auxiliary Statements

define-data

define-data defines an algebraic data type.

Example

; this defines `list`, `list.nil` and `list.cons`
(define-data list ((a tau))
  (nil)
  (cons (head-value a) (tail-value (list a))))

(define is-null
  ((a tau)
   (xs (list a)))
  (witness bool
    (case xs
      ((list.nil)
        bool.true)
      ((list.cons _ _)
        bool.false))))

(define-data free ((g (hom tau tau)) (a tau))
  (pure (_ a))  ; the name of a constructor argument can be "_"
  (impure
    (b tau) ; dependence
    (_ (g b))
    (_ (hom b (free g a)))))

Syntax

(define-data LEAF ((LEAF TREE) ... (LEAF TREE))
  (LEAF (LEAF TREE) ... (LEAF TREE))
  ...
  (LEAF (LEAF TREE) ... (LEAF TREE)))

Semantics

A define-data defines a so-called algebraic data type and its introduction rules (or the constructors of the inductive type). It also registers the type as an algebraic data type so that it can be used in combination with case and case-noetic.

An algebraic data type can be used in combination with case and case-noetic.

define-codata

define-codata defines a record type that can also be recursive.

Example

(define-codata monad ((m (hom tau tau)))
  (as-functor
    (functor m))
  (return
    (return-type m))
  (bind
    (bind-type m)))

(define-data option ((a tau))
  (none)
  (some (value a)))

(define option.as-functor
  (witness (functor option)
    ...))

(define option.return
  (witness (return-type option)
    ...))

(define option.bind
  (witness (bind-type option)
    ...))

(define option.as-monad
  (witness (monad option)
    (monad.new
      option
      option.as-functor
      option.return
      option.bind)))

; projection (extract the value for the key `bind`)
(define to-bind ((m (hom tau tau)) (dict (monad m)))
  (monad.bind m dict))

(define-codata stream ((a tau))
  (head a)
  (tail (Π () (stream a))))

(define int-stream ((k i64))
  (witness (stream i64)
    (stream.new
      i64
      k
      (lambda () (int-stream (add-i64 k 1))))))

(define stream-from-zero
  (witness (stream i64)
    (int-stream 0)))

Syntax

(define-codata LEAF ((LEAF TREE) ... (LEAF TREE))
  (LEAF TREE)
  ...
  (LEAF TREE))

Semantics

(define-codata a bts ct-1 ... ct-n) firstly creates the statement (define-data a bts (new ct-1 ... ct-n)) and evaluates it. After that, it also generates the projections for each ct-i.

define-resource-type

define-resource-type creates a new type by specifying how the terms of the type is copied/discarded.

Example

(define-resource-type some-new-type
  ; discarder
  (lambda (ptr)
    (with identity.bind
      <discard the ptr using, for example, the external function `free`, using a derangement>
      top.unit))
  ; copier
  (lambda (ptr)
    (with identity.bind
      (let new-ptr <copy the argument ptr>)
      new-ptr)))

Syntax

(define-resource-type name copier discarder)

Semantics

(define-resource-type name copier discarder) defines a new opaque type by its resource behavior. The terms of the type are discarded using the function copier, which is of type unsafe.pointer -> top. The terms are copied using discarder, which is of type unsafe.pointer -> unsafe.pointer.

Note

Since every resulting type of define-resource-type does not have any introduction rules or elimination rules, the library designer must define them explicitly, using, for example, the unsafe cast.

The core library uses this statement to define the type of i8-array, i16-array, etc. For example, an i8-array is internally represented as a tuple which is something like (LENGTH, ACTUAL_ARRAY). The discarder for this type is something that deallocates the outer tuple and the ACTUAL_ARRAY, whereas the copier for this function is something that allocates new array using malloc and LENGTH, copies the original values, and then returns the newly-created array. You can find the actual implementations of them in array.neut of the core library.

Auxiliary Terms

asterisk

* is a placeholder that must be inferred by the compiler.

Example

(define id ((a tau) (x a)) x)

(enum foo value)

(id foo foo.value) ; ~> foo.value

(id * foo.value) ; ~> foo.value (`*` is inferred to be `foo`)

Syntax

*

Semantics

* doesn’t have any operational semantics.

Notes

* can be used as a placeholder that must be resolved by the compiler using the constraints generated in its type inference procedure.

If the type is not determined, the compiler raises an error; For example, the type of x in the following code is not determined:

(define x 10)

since the 10 cannot be determined to be i32, i16, or i64, etc.

sigma

sigma is the existential quantification (tuple).

Example

; binary sigma-intro without dependence
(define pair
  (sigma-introduction (i64 2) (λ ((x tau)) x)))

; binary sigma-elim without dependence
(reduce
  (sigma-elimination (x _) pair
    (i64.print x)))

(define n-pair
  (sigma-introduction
    (i64 1)
    (λ ((x tau)) x)
    (f32 10.82)
    top.unit
    top.unit
    top
    tau))

(define 0-pair
  (sigma-introduction)

Syntax

(sigma ((x1 A1) ... (xn An)) B)

(sigma-introduction e1 ... en)

(sigma-elimination (LEAF_PLUS ... LEAF_PLUS) e1 e2)

LEAF_PLUS ::= LEAF | (LEAF TREE)

Semantics

(sigma ((x1 A1) ... (xn An)) B)
~> (Π ((Z tau)
        (_ (Π ((x1 A1) ... (xn An) (_ B)) Z)))
       Z))

(sigma-introduction e1 ... en)
~> (Π-introduction
     ((Z tau)
      (k (Π ((x1 hole) ... (xn hole)) Z)))
     (k e1 ... en))

(sigma-elimination ((x1 A1) ... (xn An)) e1 e2)
~> (e1 * (lambda ((x1 A1) ... (xn An)) e2))

Notes

A sigma is just a convenient notation of a certain use of Π. The encoding is the ordinary one in CoC.

sigma without dependence is the familiar product type; (sigma ((_ A1) ... (_ An)) B) is (product A1 ... An B).

with

with is a do-notation with the bind operation made explicit.

Example

(with identity.bind
  (let str "foo")
  (let _ (string.print &str))
  (string.print &str) ; the same as (let _ (string.print &str))
  (let x (i64 10))
  (let y (add-i64 100 x))
  (i64.print y))

(with identity.bind
  (let str "foo")
  (let _
    (let _ (i64 100)) ; each `e` in `(let x e)` is implicitly wrapped by `with`
    (string.print &str)
    (string.print &str))
  (string.print &str)
  (let len (string.print &str))
  len)

Syntax

(with TREE TREE ... TREE)

Semantics

(with bind (let x (e e1 ... en)) rest+)
~> (bind * * (with (e e1' ... en'))
     (lambda (sig)
       (sigma-elimination (x1 ... xj) sig (with rest+))))
where:
  ei' := if ei == &x then x else ei
  x1, ..., xj := (all the "borrowed" variables in e1, ..., en)
  sig : a fresh variable

(with bind (let x e) rest+)
~> (bind * * (with e)
     (lambda (x) (with rest+)))

(with bind e rest+)
~> (with bind (let _ e) rest+)

(with bind (erase x1 ... xn) rest+)
~> (erase (x1 ... xn) (with bind rest+))

(with e)
~> e

Notes

with can be understood as a generalization of begin in Scheme, or an explicit version of the do notation in Haskell.

The “borrowing” is covered by the first rule of the semantics; As you can see from the definition, this realization of borrowing works for any bind operation.

Note that the bind operator is not restricted to monadic bind; You can put anything here as long as the resulting term of this syntactic translation is well-typed.

The e in (let x e) is automatically surrounded by with.

with-subject

Example

(define length-internal
  ((s subject)
   (a tau)
   (xs (noema s (list a)))
   (acc i64))
  (witness i64
    (case-noetic xs
      ((list.nil)
        acc)
      ((list.cons _ ys)
        (length-internal s a ys (add-i64 acc 1))))))

(define length
  ((a tau)
   (xs (list a)))
  (witness (product (list a) i64)
    (with-subject s (xs)
      ((noesis.return s) i64 (length-internal s a xs 0)))))

Syntax

(with-subject LEAF (LEAF ... LEAF) TREE ... TREE)

Semantics

(with-subject s (x1 ... xn) e1 ... em)
~> (noesis.run *
     (lambda ((s subject))
       (with (noesis.bind s)
         ; cast to noema
         (let x1 (unsafe.cast t1 (noema s t1) x1))
         ...
         (let xn (unsafe.cast tn (noema s tn) xn))
         ; perform actual computation
         (let answer (with (noesis.bind s) e1 ... em))
         ; uncast from noema
         (let x1 (unsafe.cast (noema s t1) t1 x1))
         ...
         (let xn (unsafe.cast (noema s tn) tn xn))
         ; return result
         (tuple x1 ... xn answer))))

case, case-noetic

Example

(define-data option ((a tau))
  (none)
  (some (value a)))

(define from-option
  ((a tau)
   (x a)
   (m (option a)))
  (witness a
    (case m
      ((option.none)
        x)
      ((option.some y)
        y))))

(define-data list ((a tau))
  (nil)
  (cons (_ a) (_ (list a))))

(define length-internal
  ((s subject)
   (a tau)
   (xs (noema s (list a)))
   (acc i64))
  (witness i64
    (case-noetic xs
      ((list.nil)
        acc)
      ((list.cons _ ys)
        (length-internal s a ys (add-i64 acc 1))))))

Syntax

(case TREE
  (PATTERN TREE)
  ...
  (PATTERN TREE))

(case-noetic TREE
  (PATTERN TREE)
  ...
  (PATTERN TREE))

PATTERN := (LEAF ... LEAF)

Semantics

(case e clause-1 ... clause-n) ~> (case v clause-1 ... clause-n)
(case-noetic e clause-1 ... clause-n) ~> (case-noetic v clause-1 ... clause-n)

(case (ci v1 ... vm)
  ((c1 x{1, 1} ... x{1, m1}) e1)
  ...
  ((cn x{n, 1} ... x{n, mn}) en)) ~> ei {xi := vi}

(case-noetic (ci v1 ... vm)
  ((c1 x{1, 1} ... x{1, m1}) e1)
  ...
  ((cn x{n, 1} ... x{n, mn}) en)) ~> ei {xi := vi}

Note

The clauses of a case / case-noetic must be ordered as in its corresponding definition in define-data:

(define-data option ((a tau))
  (none)
  (some (value a)))

; valid
(define from-option
  ((a tau)
   (x a)
   (m (option a)))
  (witness a
    (case m
      ((option.none)
        x)
      ((option.some y)
        y))))

; invalid
(define from-option
  ((a tau)
   (x a)
   (m (option a)))
  (witness a
    (case m
      ((option.some y)
        y)
      ((option.none)
        x))))

This restriction is not an essential one; The generalization has not been done yet simply because of the priority thing.

Partial Application

Example

(define id ((a tau) (x a)) x)

; ordinary application
(reduce (id bool bool.true))

; partial application
(reduce ((id bool _) bool.true))

; of course, you can bind the partially-applied function to a variable
(define id-bool (id bool _))

(reduce (id-bool bool.true))

Sematics

(e e1 ... en)  [where e_{i1} = _, ..., e_{im} = _]
~> (λ (x1 ... xm) (e e1 ... en)) [replacing e_{ik} with xk]

Compiler Subcommands

The neut binary provides the subcommands in this section.

archive

Example

neut archive path/to/source/directory -o path/to/release/directory/0.2.0.0.tar.xz

Notes

The archive subcommand creates a tar.xz archive from the specified directory. The ensure statement expects an archive created by this subcommand.

build

Example

neut build path/to/file.neut             # create an executable ./file
neut build -o output path/to/file.neut --clang-option "-fsanitize=memory -g"   # create an executable ./output, using clang's option "-fsanitize=memory -g"
neut build --emit llvm path/to/file.neut # create a LLVM IR file ./file.ll
neut build --emit asm path/to/file.neut  # create an assembly code ./file.s

Notes

The build subcommand builds given source code and emits resulting code.

Internally, these command firstly creates an LLVM IR, then passes it to clang (if necessary).

check

Example

neut check path/to/file.neut
neut check --no-color path/to/file.neut
neut check --end-of-entry EOE path/to/file.neut

Notes

The check subcommand type-checks given file.

If --no-color option is specified, the result of type checking is printed without console color.

If --end-of-entry SEPARATOR is specified, each entry of the result of type checking is followed by SEPARATOR.

This subcommand is intended to be used with an editor-side syntax checker like flycheck.

Contribution

Development

The Structure of the Compiler

See the following super-ultra-hyper-miracle-romantic diagram:

            (input string)

src/Preprocess.hs  ↓

                 Tree         ... any AST

src/Parse.hs       ↓

               WeakTerm       ... a term in the object language, with possible holes

src/Elaborate.hs   ↓

                 Term         ... a term in the object language

src/Clarify.hs     ↓

                 Comp         ... a term in the intermediate language

src/Lower.hs       ↓

                LowComp       ... LLVM IR (AST)

src/Emit.hs        ↓

            (output string)   ... LLVM IR (string)

clang              ↓

             (object code)

Some words on each foundational modules:

src/Preprocess.hs reads given file and generate a list of AST, expanding all the macros. More specifically, it firstly parses the string in the given file into a list of trees ([Tree]), visiting other files if necessary. After that, the module interprets those trees as the terms of the meta calculus ([MetaTerm]), and tries to normalize them. For each of them, if the result of normalization is an AST (that is, a term that consists of leaf and node), then src/Preprocess.hs translates the MetaTerm into a Tree so that they can be parsed as terms of the object language later. If the result is not an AST value (e.g. (lambda (x) x) is not an AST value since it contains the syntactic construct lambda), it simply reports an error.

src/Parse.hs parses given list of AST into terms of the object language.

src/Elaborate.hs performs type inference. The module src/Elaborate/Infer.hs generates type constraints, whereas src/Elaborate/Unify.hs resolves them, creating a substitution. Using the resulting substitution, src/Elaborate.hs resolves all the unknown holes (*) into actual terms. If you read src/Elaborate/Unify.hs, you will note that the constraints are not of the form (e1, e2) but of the form ((e1, e2), (e1', e2')). This is for better error reporting; The right element (e1', e2') is technically not necessary for type inference.

src/Clarify.hs (1) specifies the evaluation order of the given term, (2) performs closure conversion, and (3) linearize all the occurrences of the variables. More specifically, (1) consider a term (lambda x. x * x) @ (1 + 1). This term can be reduced into (lambda x. x * 1) @ 2 or (1 + 1) + (1 * 1). The module clarifies this ambiguity by specifying evaluation order in call-by-value manner. (2) The closure conversion is the usual one, except for the fact that closures are generalized into 3-element tuples. (3) Linearization is a process that copies/discards terms along its types so that they are used linearly. The actual implementation of this linearization can be found in src/Clarify/Linearize.hs.

src/Lower.hs translates clarified terms into LLVM IR. The point here is that every value is casted into i8* (the void pointer), and casted back to its actual type only when necessary. This approach enables us to realize polymorphic functions.

src/Emit.hs simply translates LLVM IR into actual byte string, using Data.ByteString.Builder instead of ByteString just for performance.

Donation

Feel free to buy me an apple juice:

bitcoin:bc1q689xghadqlrvn05w0hzkm6tswlc8py7nrtqz76
monero:4B8pz5KokF62tMREEYZ4eEhNVWxhXMVcADXvo7Anp6VRW4KGqstCdHMjZ3xyW7UutYbvdce3NYQNyT7LcNf3qr7R5GNxZoE

Things Left Undone

  • incremental compilation
  • syntax highlighter for editors other than Emacs
  • primitive-level exception
  • better pattern matching syntax (like nested pattern)
  • wrapper library for parallel computation
  • track the location of a code piece when expanding macros more acculately
  • sophistication on library-related things
  • (Should I remove the meta-language and implement syntactic constructs simply in Parse.Interpret?)
  • etc.

neut's People

Contributors

akavel avatar l-as avatar vekatze avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.