alpaca-lang / alpaca Goto Github PK

Functional programming inspired by ML for the Erlang VM

License: Other

Erlang 94.79% Shell 0.11% Makefile 0.07% OCaml 5.03%

ml erlang erlang-vm alpaca statically-typed hindley-milner

alpaca's Introduction

Alpaca

Alpaca is a statically typed, strict/eagerly evaluated, functional programming language for the Erlang virtual machine (BEAM). At present it relies on type inference but does provide a way to add type specifications to top-level function and value bindings. It was formerly known as ML-flavoured Erlang (MLFE).

TLDR; How Do I Use It?

Make sure the following are installed:

Erlang OTP 19.3 or above (packages from Erlang Solutions, most development at present uses OTP 19.3 and 20.0 locally from kerl)
Rebar3
a build of Alpaca itself

Installing Alpaca

Releases for OTP 19.3 and 20.0 are built by Travis CI and are available under this repository's releases page here. You will want one of the following:

alpaca_19.3.tgz
alpaca_20.0.tgz

You can unpack these anywhere and point the environment variable ALPACA_ROOT at the base folder, or place the beams sub-folder in any of the following locations:

/usr/lib/alpaca
/usr/local/lib/alpaca
/opt/alpaca

Please see the rebar3 plugin documentation for more details.

Using Alpaca in a Project

Make a new project with rebar3 new app your_app_name and in the rebar.config file in your project's root folder (e.g. your_app_name/rebar.config) add the following:

{plugins, [
    {rebar_prv_alpaca, ".*", {git, "https://github.com/alpaca-lang/rebar_prv_alpaca.git", {branch, "master"}}}
]}.

{provider_hooks, [{post, [{compile, {alpaca, compile}}]}]}.

Check out the tour for the language basics, put source files ending in .alp in your source folders, run rebar3 compile and/or rebar3 eunit.

Building and Using Your Own Alpaca

Rather than using an official build, you can build and test your own version of Alpaca. Please note that Alpaca now needs itself in order to build. The basic steps are:

Clone and/or modify Alpaca to suit your needs.
Compile your build with rebar3 compile.
Make a local untagged release for your use with bash ./make-release.sh in the root folder of Alpaca.

Then export ALPACA_ROOT, e.g. in the Alpaca folder:

export ALPACA_ROOT=`pwd`/alpaca-unversioned_`

The rebar3 plugin should now find the Alpaca binaries you built above.

Editor Support

Alpaca plugins are available for various editors.

Emacs: alpaca-mode
Vim: alpaca_vim
Visual Studio Code: alpaca-vscode

Intentions/Goals

Something that looks and operates a little bit like an ML on the Erlang VM with:

Static typing of itself. We're deliberately ignoring typing of Erlang code that calls into Alpaca.
Parametric polymorphism
Infinitely recursive functions as a distinct and allowable type for processes looping on receive.
Recursive data types
Syntax somewhere between OCaml and Elm
FFI to Erlang code that does not allow the return of values typed as term() or any()
Simple test annotations for something like eunit, tests live beside the functions they test

The above is still a very rough and incomplete set of wishes. In future it might be nice to have dialyzer check the type coming back from the FFI and suggest possible union types if there isn't an appropriate one in scope.

What Works Already

Type inferencer with ADTs. Tuples, maps, and records for product types and unions for sum. Please note that Alpaca's records are not compatible with Erlang records as the former are currently compiled to maps.
Compile type-checked source to .beam binaries
Simple FFI to Erlang
Type-safe message flows for processes defined inside Alpaca

Here's an example module:

module simple_example

-- a basic top-level function:
let add2 x = x + 2

let something_with_let_bindings x =
  -- a function:
  let adder a b = a + b in
  -- a variable (immutable):
  let x_plus_2 = adder x 2 in
  add2 x

-- a polymorphic ADT:
type messages 'x = 'x | Fetch pid 'x

{- A function that can be spawned to receive `messages int`
    messages, that increments its state by received integers
    and can be queried for its state.
-}
let will_be_a_process x = receive with
    i -> will_be_a_process (x + i)
  | Fetch sender ->
    let sent = send x sender in
    will_be_a_process x

let start_a_process init = spawn will_be_a_process init

Licensing

Alpaca is released under the terms of the Apache License, Version 2.0

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

 http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Contributions and Help

Please note that this project is released with a Contributor Code of Conduct, version 1.4. By participating in this project you agree to abide by its terms. See code_of_conduct.md for details.

You can join #alpaca-lang on freenode to discuss the language (directions, improvement) or get help. This IRC channel is governed by the same code of conduct detailed in this repository.

Pull requests with improvements and bug reports with accompanying tests welcome.

Using It

It's still quite early in Alpaca's evolution but the tests should give a relatively clear picture as to where we're going. test_files contains some example source files used in unit tests. You can call alpaca:compile({files, [List, Of, File, Names, As, Strings]}, [list, of, options]) or alpaca:compile({text, CodeAsAString}, [options, again]) for now but generally we recommend using the rebar3 plugin.

Supported options are:

'test' - This option will cause all tests in a module to be type checked and exported as functions that EUnit should pick up.
{'warn_exhaustiveness', boolean()} - If set to true (the default), the compiler will print warnings regarding missed patterns in top level functions.

Errors from the compiler (e.g. type errors) are almost comically hostile to usability at the moment. See the tests in alpaca_typer.erl.

Prerequisites

You will generally want the following two things installed:

Erlang/OTP 19.3 or above (packages from Erlang Solutions, most development so far uses OTP 19.3 and 20.0 locally from kerl)
Rebar3

Writing Alpaca with Rebar3

Thanks to @tsloughter's Alpaca Rebar3 plugin it's pretty easy to get up and running.

Make a new project with Rebar3 (substituting whatever project name you'd like for alpaca_example):

$ rebar3 new app alpaca_example
$ cd alpaca_example

In the rebar.config file in your project's root folder add the following (borrowed from @tsloughter's docs):

{plugins, [
    {rebar_prv_alpaca, ".*", {git, "https://github.com/alpaca-lang/rebar_prv_alpaca.git", {branch, "master"}}}
]}.

{provider_hooks, [{post, [{compile, {alpaca, compile}}]}]}.

Now any files in the project's source folders that end with the extension .alp will be compiled and included in Rebar3's output folders (provided they type-check and compile successfully of course). For a simple module, open src/example.alp and add the following:

module example

export add/2

let add x y = x + y

The above is just what it looks like: a module named example with a function that adds two integers. You can call the function directly from the Erlang shell after compiling like this (note alpaca prepends alpaca_ to the module name, so in the erlang shell you must explicitly add this):

$ rebar3 shell
... compiler output skipped ...
1> alpaca_example:add(2, 6).
8
2>

Note that calling Alpaca from Erlang won't do any type checking but if you've written a variety of Alpaca modules in your project, all their interactions with each other will be type checked and safe (provided the compile succeeds).

Compiler Hacking

If you have installed the prerequisites given above, clone this repository and run tests and dialyzer with:

rebar3 eunit
rebar3 dialyzer

There's no command line front-end for the compiler so unless you use @tsloughter's Rebar3 plugin detailed in the previous section, you will need to boot the erlang shell and then run alpaca:compile/2 to build and type-check things written in Alpaca. For example, if you wanted to compile the type import test file in the test_files folder:

rebar3 shell
...
1> Files = ["test_files/basic_adt.alp", "test_files/type_import.alp"].
2> alpaca:compile({files, Files}, []).

This will result in either an error or a list of tuples of the following form:

{compiled_module, ModuleName, FileName, BeamBinary}

The files will not actually be written by the compiler so the binaries described by the tuples can either be loaded directly into the running VM (see the tests in alpaca.erl) or written manually for now unless of course you're using the aforementioned rebar3 plugin/

Built-In Stuff

Most of the basic Erlang data types are supported:

booleans, true or false
atoms, :atom, :"Quoted Atom!"
floats, 1.0
integers, 1
strings, "A string". These are encoded as UTF-8 binaries.
character lists, like default Erlang strings, c"characters here"
lists, [1, 2, 3] or 1 :: 2 :: [3]
binaries, <<"안녕, this is some UTF-8 text": type=utf8>>, <<1, 2, 32798: type=int, size=16, signed=false>>, etc
tuples, ("a", :tuple, "of arity", 4)
maps (basic support), e.g. #{:atom_key => "string value"}. These are statically typed as lists are (generics, parametric polymorphism).
records (basic support), these look a bit like OCaml and Elm records, e.g. {x=1, hello="world"} will produce a record with an x: int and hello: string field. Please see the language tour for more details.
pids, these are also parametric (like lists, "generics"). If you're including them in a type you can do something like type t = int | pid int for a type that covers integers and processes that receive integers.

In addition there is a unit type, expressed as ().

Note that the tuple example above is typed as a tuple of arity 4 that requires its members to have the types string, atom, string, integer in that order.

On top of that you can define ADTs, e.g.

type try 'success 'error = Ok 'success | Error 'error

And ADTs with more basic types in unions work, e.g.

type json = int | float | string | bool
          | list json
          | list (string, json)

Types start lower-case, type constructors upper-case.

Integer and float math use different symbols as in OCaml, e.g.

1 + 2      -- ok
1.0 + 2    -- type error
1.0 + 2.0  -- type error
1.0 +. 2.0 -- ok

Basic comparison functions are in place and are type checked, e.g. > and < will work both in a guard and as a function but:

1 > 2             -- ok
1 < 2.0           -- type error
"Hello" > "world" -- ok
"a" > 1           -- type error

See src/builtin_types.hrl for the included functions.

Pattern Matching

Pretty simple and straightforward for now:

let length l = match l with
    [] -> 0
  | h :: t -> 1 + (length t)

The first clause doesn't start with | since it's treated like a logical OR.

Pattern match guards in clauses essentially assert types, e.g. this will evaluate to a t_bool type:

match x with
  b, is_bool b -> b

and

match x with
  (i, f), is_integer i, is_float f -> :some_tuple

will type to a tuple of integer, float.

Since strings are currently compiled as UTF-8 Erlang binaries, only the first clause will ever match:

type my_binary_string_union = binary | string

match "Hello, world" with
    b, is_binary b -> b
  | s, is_string s -> s

Further, nullary type constructors are encoded as atoms and unary constructors in tuples led by atoms, e.g.

type my_list 'x = Nil | Cons ('x, my_list 'x)

Nil will become 'Nil' after compilation and Cons (1, Nil) will become {'Cons', {1, 'Nil'}}. Exercise caution with the order of your pattern match clauses accordingly.

Maps

No distinction is made syntactically between map literals and map patterns (=> vs := in Erlang), e.g

match my_map with
  #{:a_key => some_val} -> some_val

You can of course use variables to match into a map so you could write a simple get-by-key function as follows:

type my_opt 'a = Some 'a | None

let get_by_key m k =
  match m with
      #{k => v} -> Some v
    | _ -> None

Modules (The Erlang Kind)

ML-style modules aren't implemented at present. For now modules in Alpaca are the same as modules in Erlang with top-level entities including:

a module name (required)
function exports (with arity, as in Erlang)
type imports (e.g. use module.type)
type declarations (ADTs)
functions which can contain other functions and variables via let bindings.
functions are automatically curried (with some limitations)
simple test definitions

An example:

module try

export map/2  -- separate multiple exports with commas

-- type variables start with a single quote:
type maybe_success 'error 'ok = Error 'error | Success 'ok

-- Apply a function to a successful result or preserve an error.
let try_map e f = match e with
    Error _ -> e
  | Success ok -> Success (f ok)

Tests

Tests are expressed in an extremely bare-bones manner right now and there aren't even proper assertions available. If the compiler is invoked with options [test], the following will synthesize and export a function called add_2_and_2_test:

let add x y = x + y

test "add 2 and 2" =
  let res = add 2 2 in
  assert_equal res 4

let assert_equal x y =
  match x == y with
    | true -> :ok
    | _ -> throw (:not_equal, x, y)

Any test that throws an exception will fail so the above would work but if we replaced add/2 with add x y = x + (y + 1) we'd get a failing test. If you use the rebar3 plugin mentioned above, rebar3 eunit should run the tests you've written. There's a bug currently where the very first test run won't execute the tests but all runs after will (not sure why yet).

The expression that makes up a test's body is type inferenced and checked. Type errors in a test will always cause a compilation error.

Processes

An example:

let f x = receive with
  (y, sender) ->
    let z = x + y in
    let sent = send z sender in
  f z

let start_f init = spawn f init

All of the above is type checked, including the spawn and message sends. Any expression that contains a receive block becomes a "receiver" with an associated type. The type inferred for f above is the following:

{t_receiver,
  {t_tuple, [t_int, {t_pid, t_int}]},
  {t_arrow, [t_int], t_rec}}

This means that:

f has it's own function type (the t_arrow part) but it also contains one or more receive calls that handle tuples of integers and PIDs that receive integers themselves.
f's function type is one that takes integers and is infinitely recursive.

send returns unit but there's no "do" notation/side effect support at the moment hence the let binding. spawn for the moment can only start functions defined in the module it's called within to simplify some cross-module lookup stuff for the time being. I intend to support spawning functions in other modules fairly soon.

Note that the following will yield a type error:

let a x = receive with
  i -> b x + i

let b x = receive with
  f -> a x +. i

This is because b is a t_float receiver while a is a t_int receiver. Adding a union type like type t = int | float will solve the type error.

If you spawn a function which nowhere in its call graph posesses a receive block, the pid will be typed as undefined, which means all message sends to that process will be a type error.

Current FFI

The FFI is quite limited at present and operates as follows:

beam :a_module :a_function [3, "different", "arguments"] with
    (ok, _) -> :ok
  | (error, _) -> :error

There's clearly room to provide a version that skips the pattern match and succeeds if dialyzer supplies a return type for the function that matches a type in scope (union or otherwise). Worth noting that the FFI assumes you know what you're doing and does not check that the module and function you're calling exist.

Localization

Compiler error messages may be localized by calling alpaca_error_format:fmt/2. If no translation is available in the specified locale, the translation for en_US will be used.

Localization is performed using gettext ".po" files stored in priv/lang. To add a new language, say Swedish (sv_SE), create a new file priv/lang/alpaca.sv_SE.po. If you use Poedit, you may then import all messages to be translated by selecting "Catalog -> Update from POT file..." in the menu, and then pick priv/lang/alpaca.pot. The messages may be a bit cryptic. Use the en_US as an aid to understand them.

The POT file is automatically updated whenever alpaca is compiled. Updates to po-files are also picked up at the compile phase.

Problems

What's Missing

A very incomplete list:

self() - it's a little tricky to type. The type-safe solution is to spawn a process and then send it its own pid. Still thinking about how to do this better.
exception handling (try/catch)
any sort of standard library. Biggest missing things right now are things like basic string manipulation functions and adapters for gen_server, etc.
anything like behaviours or things that would support them. Traits, type classes, ML modules, etc all smell like supersets but we don't have a definite direction yet.
simpler FFI, there's an open issue for discussion: #7
annotations in the BEAM file output (source line numbers, etc). Not hard based on what can be seen in the LFE code base.
support for typing anything other than a raw source file.
side effects, like using ; in OCaml for printing in a function with a non-unit result.

Implementation Issues

This has been a process of learning-while-doing so there are a number of issues with the code, including but not limited to:

there's a lot of cruft around error handling that should all be refactored into some sort of basic monad-like thing. This is extremely evident in alpaca_ast_gen.erl and alpaca_typer.erl. Frankly the latter is begging for a complete rewrite.
type unification error line numbers can be confusing. Because of the sequence of unification steps, sometimes the unification error might occur at a function variable's location or in a match expression rather than in the clauses. I'm considering tracking the history of changes over the course of unifications in a reference cell in order to provide a typing trace to the user.
generalization of type variables is incompletely applied.

Parsing Approach

Parsing/validating occurs in several passes:

yecc for the initial rough syntax form and basic module structure. This is where exports and top-level function definitions are collected and the initial construction of the AST is completed.
Validating function definitions and bindings inside of them. This stage uses environments to track whether a function application is referring to a known function or a variable. The output of this stage is either a module definition or a list of errors found. This stage also renames variables internally.
Type checking. This has some awkward overlaps with the environments built in the previous step and may benefit from some interleaving at some point. An argument against this mixing might be that having all functions defined before type checking does permit forward references.

AST Construction

Several passes internally

for each source file (module), validate function definitions and report syntax errors, e.g. params that are neither unit nor variable bindings (so-called "symbols" from the yecc parser), building a list of top-level internal-only and exported functions for each module. The output of this is a global environment containing all exported functions by module and an environment of top-level functions per module or a list of found errors.
for each function defined in each module, check that every variable and function reference is valid. For function applications, arity is checked where the function applied is not in a variable.

Type Inferencing and Checking

At present this is based off of the sound and eager type inferencer in http://okmij.org/ftp/ML/generalization.html with some influence from https://github.com/tomprimozic/type-systems/blob/master/algorithm_w where the arrow type and type schema instantiation are concerned.

Single Module Typing

module example

export add/2

let add x y = adder x y

let adder x y = x + y

The forward reference in add/2 is permitted but currently leads to some wasted work. When typing add/2 the typer encounters a reference to adder/2 that is not yet bound in its environment but is available in the module's definition. The typer will look ahead in the module's definition to determine the type of adder/2, use it to type add/2, and then throw that work away before proceeding to type adder/2 again. It may be beneficial to leverage something like ETS here in the near term.

Recursion

Infinitely recursive functions are typed as such and permitted as they're necessary for processes that loop on receive. Bi-directional calls between modules are disallowed for simplicity. This means that given module A and B, calls can occur from functions in A to those in B or the opposite but not in both directions.

I think this is generally pretty reasonable as bidirectional references probably indicate a failure to separate concerns but it has the additional benefit of bounding how complicated inferencing a set of mutually recursive functions can get. The case I'm particularly concerned with can be illustrated with the following Module.function examples:

let A.x = B.y ()
let B.y = C.z ()
let C.z = A.x ()

This loop, while I belive possible to check, necessitates either a great deal of state tracking complexity or an enormous amount of wasted work and likely has some nasty corner cases I'm as yet unaware of.

The mechanism for preventing this is simple and relatively naive to start: entering a module during type inferencing/checking adds that module to the list of modules encountered in this pass. When a call occurs (a function application that crosses module boundaries), we check to see if the referenced module is already in the list of entered modules. If so, type checking fails with an error.

No "Any" Type

There is currently no "any" root/bottom type. This is going to be a problem for something like a simple println/printf function as a simple to use version of this would best take a List of Any. The FFI to Erlang code gets around this by not type checking the arguments passed to it and only checking the result portion of the pattern matches.

alpaca's People

Contributors

Stargazers

Watchers

alpaca's Issues

error, exit, throw and the typing of them

Need these three for lots of things, not least of which is being able to write stuff like basic test matchers without resorting to the FFI, something like

raise_error <term>
raise_exit <term> (only erlang:exit/1 for now)
raise_throw <term>

Maybe it makes sense to introduce special terms instead along the lines of error, exit, and throw, just not sure if those should be reserved words or not.

I'm proposing that these three are parametric error types potentially of the form t_err 'kind 'awhere 'kind is one of error, exit, or throw. 'a would be the type of the term used in the various user defined occurrences of t_err. E.g. raise_error :bad_arith would type to {t_err, error, t_atom}.

Unification:

with other t_err terms uses unification as normal. Unifying {t_err, throw, t_atom} with {t_err, error, t_atom} is a type error as would be {t_err, error, t_atom} and {t_err, error, t_string} without a type in scope that unifies them.
with non-t_err terms the errors unify to the other type. E.g. unifying t_int with {t_err, throw, t_string} yields t_int for both types. I think this leaves open the option later to parameterize every type with the potentially raised errors below, somewhat like receivers.

The latter of these allows the following to type to 'a -> 'a -> t_bool:

assert_equal a b = match (a == b) with
    true  -> true
  | false -> raise_throw (not_equal, a, b)

Polymorphic functions that pull record items aren't constraining the result type properly

The following three tests fail, we should expect that a get_x not returning an option won't unify with my_map/2's second argument and thus fail but the typer accepts the integer argument instead of rejecting it. I have not yet confirmed or denied that this behaviour is limited to records.

    , fun() ->
              Code =
                  "module fun_pattern_with_adt\n\n"
                  "type option 'a = None | Some 'a\n\n"
                  "my_map _ None = None\n\n"
                  "my_map f Some a = Some (f a)\n\n"
                  "doubler x = x * x\n\n"
                  "foo = my_map doubler 2",
              ?assertMatch(
                 {error, {cannot_unify, _, _, #adt{}, t_int}},
                 module_typ_and_parse(Code))
      end
    , fun() ->
              Code =
                  "module fun_pattern_with_adt\n\n"
                  "type option 'a = None | Some 'a\n\n"
                  "my_map _ None = None\n\n"
                  "my_map f Some a = Some (f a)\n\n"
                  "doubler x = x * x\n\n"
                  "get_x {x=x} = x\n\n"
                  "foo () = "
                  "  let rec = {x=1, y=2} in "
                  "  my_map doubler (get_x rec)",
              ?assertMatch(
                 {error, {cannot_unify, _, _, #adt{}, t_int}},
                 module_typ_and_parse(Code))
      end
    , fun() ->
              Code =
                  "module fun_pattern_with_adt\n\n"
                  "type option 'a = None | Some 'a\n\n"
                  "my_map _ None = None\n\n"
                  "my_map f Some a = Some (f a)\n\n"
                  "doubler x = x * x\n\n"
                  "get_x rec = match rec with {x=x} -> x\n\n"
                  "foo () = "
                  "  let rec = {x=1, y=2} in "
                  "  my_map doubler (get_x rec)",
              ?assertMatch(
                 {error, {cannot_unify, _, _, #adt{}, t_int}},
                 module_typ_and_parse(Code))
      end

Referring to missing types does not generate an error

module x

type a = int | b

This doesn't generate an error about an undefined type b and it should.

Thanks to @danabr for pointing this out in IRC

Can't refer to a type qualified with a module name

Given

module m
type t = int

The following yields a syntax error but should be allowed:

module n
type u = m.t

This is especially important since the changes discussed in #62 won't permit m.t to be imported but should still allow it to be referenced.

AST changes for function and type bindings

@danabr raised a point about type definitions vs types themselves in PR #116 , e.g.

type opt 'a = Some 'a | None
type int_opt = opt int

The name in the left-hand side of each of these could be viewed in a similar manner as function names in let bindings, that is, independent of their variables and members/bodies whereas opt int is a concrete type. Similarly as raised by @ypaq and others elsewhere:

let f x = x + x

could be viewed as syntactic sugar for

let f = fun x -> x + x

or something to that effect.

We might then add an alpaca_type_def node that binds variables and members to a type name while a member of a type binding stays as an alpaca_type AST node since it is in fact a concrete type. Functions then might be decomposed into two things as well:

an alpaca_fun node that lists function versions, each of which has their associated variables and bodies.
alpaca_fun_def that binds a name and arity (the latter for convenience) to a single alpaca_fun AST node.

This should make lambdas fairly simple to implement and makes types, functions, and values all operate in a similar manner.

Thoughts?

Importing a type but not one it depends on causes an error

Failing test illustrating the issue committed in master on my fork: https://github.com/j14159/alpaca/blob/master/src/alpaca_typer.erl#L4197

OTP versions supported officialy?

I've seen in the README.md that there is only “official” support for 18.x because thats what you use @j14159. Personally I have to handle multiple versions on my system and swap back and forth all the time, and as such like to try to maintain compatibility in a given range. I already have an internal project which I have to keep compatible from 16B3 to current, but I do hope beeing able to drop 16 before years end.

Also I am trying to set up travis on my fork and wanted to know which versions of erlang I shall test against.

The current version does compile against the latest minor release of 17, 18 and 19, where 17 fails during compilation phase and 18 and 19 both pass all tests, but fail in dialyzer phase. For 17 there weren't even a dialyzer run, because of known trouble with modules generated from xrl and yrl files before OTP 18.

I can reproduce the exact same behaviour on my local system.

From what I have observed on my system and at travis, I'd guess supported range for now should be 18 and 19, while the tests should be run on 18.2, 18.3 and 19.0 at least, dropping 18.2 as soon 19.1 has been released. But I will follow what ever you suggest here.

I will do a WIP-PR in a couple of minutes.

Improve type and function sharing between modules

Short list, driven by discussion on PR #61 with @danabr and from @lepoetemaudit's infix function work:

all type names available by default but not implementations (e.g. type constructors). Exporting a type exposes its implementation. This is to let us use a module's type in other modules' types without needing the details (information hiding, abstract/opaque types).
individual functions used in other modules without qualifying them with their module name, this will make infix operators more useful.

Per @danabr we should consider a single import directive that allows importing specific functions, types, or even a subset of a type's constructors.

Ideas/expansions/criticisms most welcome.

Community?

I wonder if you plan to make a mailinglist/irc channel or some such to talk about the language. I and a few other people are very interested in discussing features and the like, but issues seem to be the wrong place to do it.

Make types module local by default

As raised by @danabr in issue #34.

Type expression parsed differently whether a builtin type is used or not.

This code parses correctly:

type my_map = map atom atom

This one fails with ["syntax error before: ",[]]:

type my_atom = atom
type my_map = map my_atom atom

This again parses fine:

type my_atom = atom
type my_map = map (my_atom) atom

i.e, type my_map = map my_atom atom is parsed as type my_map = map (my_atom atom) .

rebar3 plugin

I'm excited to play around and hopefully help with mlfe. To start I began the process of creating a rebar3 plugin :) https://github.com/tsloughter/rebar_prv_mlfe

On a side note, I'd suggest either adding rebar.lock to the .gitignore of mlfe or just committing it even though it is empty. Hmm, maybe rebar3 should stop outputting it if it is empty though... I'll think about that :)

Anyway, just letting you know, so feel free to close this issue after reading it.

Guards in function heads

Trying to compile @j14159's example code from another issue:

module guards
type make_it_work = int | string
let f x, is_int x = x + 1
let f x, is_string s = string_append "hello, x"

we get {error,{3,alpaca_parser,["syntax error before: ","','"]}}

For consistency with match expresssions, guards should be allowed in function heads (just like in Erlang).

Side note: Personally, I would rather drop guards completely from match expressions, since guards disable exhaustiveness checks ("Having a compiler warn about non-exaustive guards would be impossible in the general case, as it would involve solving the halting problem" (http://stackoverflow.com/a/7109455/347687)), and rather have an if expression.

Consider prefixing compiled Alpaca modules

When Elixir compiles modules, it prefixes the generated module name with 'Elixir.' (more info here). This means that any modules compiled with Elixir don't risk clashing with Erlang ones, and it keeps the Elixir standard library nicely namespaced. From within Elixir code, you don't need to use the prefix.

I believe the same would be useful in Alpaca - perhaps automatically prefix every generated module with alpaca_. It means a bit of extra typing when calling from the Erlang side, but it also means we could give nice names to Alpaca standard library modules like string instead of having to import e.g. alpaca_string within Alpaca code.

Support types with same name and different arities

As a user, I'd like to define a type foo/1 as well as a type foo/2:

type foo 'a = Nothing | Just 'a

type foo 'a 'b = Left 'a | Right 'b

More docs on calling Alpaca from Erlang

Specific items:

modules Alpaca generates are prefixed with alpaca_
to pattern match correctly, records and maps passed to Alpaca need a particular __struct__ field to exist.

Boolean operator support

As far as I can tell, we cannot do things like (x == 1) || (y < 4) and (x == 1) && (y < 4).

User defined types should be parameterizable with builtin types

Consider this example:

module user_defined_types

type proplist 'k 'v = list ('k, 'v)

type optlist 'v = proplist atom 'v

This fails with:

{error, {badmatch, {error, {6,mlfe_parser, ["syntax error before: ",["\"atom\""]]}}}

The parser expects a user defined type to only take type variables. See poly_type in mlfe_paser.yrl.

mlfe_typer also has this assumption.

Individual record field access

At the moment we can only access members of records via a pattern match but we need to be able to do

let r = {x=1, y=2} in
  r.x + 2

I think this will require rewriting the AST in or before the code generation stage to put a pattern match up front, e.g. in Erlang:

case r of
    #{'__struct__' := record, 'x' := R_x} ->  R_x + 2

Updates to records

Need a way to update or add fields to records, e.g.

let r = {x=1, y=2} in
  {r | z=3}

Union type as ADT argument

The following test case fails:

union_type_as_adt_arg_test() ->
    Code = "module adt\n\n"
           "type union = int | atom\n\n"
           "type t = Union union\n\n"
           "make () = Union 1",
    ?assertMatch({ok, _}, module_typ_and_parse(Code)).

Error:

{error, {cannot_unify,adt,7,  {adt,"union",[],[]}, t_int}}

It seems like the knowledge about what members are in the union got lost along the way.

This is the line reporting the error: https://github.com/j14159/mlfe/blob/master/src/mlfe_typer.erl#L557

Unification failure when subexpression is put in argument position

Consider this function:

duplicate count el =
  match count with
    0 -> []
  | _ -> el :: (duplicate (count-1) el)

Compiling it crashes with:

exception error: no match of right hand side value 
                 {error,{cannot_unify,example,6,
                                      {t_arrow,[<0.90.0>],<0.91.0>},
                                      t_int}}
  in function  mlfe:compile/2 ([...]/mlfe/src/mlfe.erl, line 55)

However, this type checks and compiles just fine:

duplicate count el =
  match count with
    0 -> []
  | _ ->
    let next_count = count - 1 in
    el :: (duplicate next_count el)

VIm syntax highlighting

Hello! This would be ace for us vim users.

You're probably too busy (and an emacs user) to work on this, but I thought I'd note this down for the future :)

FFI Bridge Proposal

Bridges are to MLFE as ports are to Elm, without the send/receive and subscription semantics.

This is motivated by questions from @imetallica, discussion and feedback from @omarkj, and naming concerns from @lpil.

Example:

bridge append_ints = :erlang :"++" [list int, list int] list_int

Given the above in a module, the compiler will synthesize the function append_ints, typed to take to integer lists and return one that is a combination of both:

{t_arrow, [{t_list, t_int}, {t_list, t_int}], {t_list, t_int}}

The typer will trust that the author has considered the types involved and will expose this function for type checking. The code generator will create this function in the output Core Erlang AST and programmatically create the necessary checks for the return value. If we follow what Elm has done, this will create some substantial overhead on any recursive type like lists, maps, and recursive ADTs as each element must be checked before returning the result to MLFE code. A more problematic example:

type maybe_io_device = Ok pid unit | Error atom

bridge open_file = :file :open [string, list atom] io_device

If you refer to the erldocs for file:open/2, you'll notice the types I've given above to the bridge are incomplete, for example I'm not accounting for the fd() type which in the given docs doesn't appear to devolve to a pid. A larger issue is that currently the compiler would render the maybe_io_device ADT as either {'Ok', Pid} or {'Error', ErrorAtom} in any pattern match checking the validity of the return. This is relatively trivial to change and may make sense for simpler handling of common Erlang patterns directly as ADTs with no intermediary translation layer at all.

More specifically, given the changes to how ADTs are rendered, the code above would be synthesized to the following in the code generator:

open_file(Filename, Modes) ->
    case file:open(Filename, Modes) of
        {ok, IO}=Ok when is_pid(IO) -> Ok;
        {error, Reason}=Err when is_atom(Reason) -> Err
    end.

This has rather large safety implications:

do we let this explode on the Erlang side unchecked, somewhat similarly to Elm?
do we generate code that is already wrapped in a try/catch to account for errors, utilizing some sort of built-in type like type try 'x = Success 'x | Error erlang_exception? I use this type in Scala but does this remove Erlang-ness from the language?
should we have a default safe mode as in the previous point and a keyword to remove the try/catch, e.g. unsafe bridge open_file = ... that doesn't wrap the result?
should bridges always be unsafe but any that occur without a surrounding try/catch raise a compiler warning?

I'm leaning towards point 3 at the moment but curious about other opinions and would like to know if I've missed anything (beyond the complexity checking recursive structures entails).

Specifying concrete types as type parameters instead of vars fails typing

This fails, the typer tries to unify t_int and undefined:

module n
type opt 'a = Some 'a | None
type u = U opt int
let f () = U Some 1

This also fails with the same basic unification error:

module m
export_type t
type t 'a = T 'a

module n
type u 'a = U m.t 'a
let f () = U m.T 1

I haven't dug in too deeply yet but I expect what's happening is that when we do:

type option 'a = Some 'a | None
type something_else = option int

The parser has no idea that int is supposed to be assigned to a variable 'a (or any variable, for that matter) and so trying to get that variable from the vars proplist in an #adt{} yields undefined.

I think the fix might be pretty simple: when we look through the parameters given for a type that's a member of another type, we just manufacture a new type variable for each type expression that isn't already a type var.

Type aliasing

Consider the following example:

module shape

type radius = int

type shape = Circle radius

make_circle r = Circle r

test_circle () = make_circle 1

This unexpectedly fails to typecheck with the error:

exception error: no match of right hand side value 
                 {error,{cannot_unify,shape,9,{adt,"radius",[],[]},t_int}}

That is, radius is considered a distinct type from int.

The same module in OCaml compiles just fine:

type radius = int

type shape = Circle of radius

let make_circle r = Circle r

let test_circle () = make_circle 1

I think it makes sense for MLFE to behave the same way.

If I would like to make the radius type abstract (or opaque in dialyzer terms), I would hide it in a module (example in OCaml):

module Radius : sig
  type radius

  val make_radius: int -> radius
end =
struct
  type radius = int

  let make_radius i = i
end

type shape = Circle of Radius.radius

let make_circle r = Circle r

let test_circle () = make_circle (Radius.make_radius 1)

To achieve the same in MLFE, types should be module local by default, and you would have to export them via a export_type directive or similar. We would also need to be able to mark types as abstract. In Erlang/Dialyzer this is achieved by using the -opaque directive.

Function types in ADTs

We currently have no way to describe a function as a member of an ADT. Something pretty simple to start I think, along the lines of most ML-like things I've seen e.g. type add = int -> int -> int

Idea: skip process handling

Hi, cool project.

Just an idea: Why not skip support for message send/receive?
Mostly you use the gen_server.erl anyway and if you really want to
send/receive you encapsulate that in your own Erlang module anyway.

edoc spec extraction

We might want to look into how edoc extracts specs of functions (which it then includes in the generated documentation).

Top level values (or nullary functions) are not usable at present

Currently, it is possible to define, but not use, zero-argument (nullary) functions. For example, the following throws a compilation error:

module example

x = 10

run () = x + x

Specifically:

{cannot_unify,main,5,t_int,{t_arrow,[],<0.178.0>}}

In other words, it's failing the type check because x is compiled as a zero-arg function, whereas the function + expects only integers. However, this compiles:

module example2

x = 10

run () = x

When called from Erlang, run () returns the zero arg function x, which can be invoked from Erlang to produce the value 10, but as far as I can tell but there is no way of getting the return value of x within Alpaca.

@j14159 has stated, and I agree, that nullary functions are not desirable and that it would be better if x is understood in this circumstance as a constant value. We discussed this on IRC and raised several issues:

When should the value be calculated? Compile time? Runtime (i.e. on module load?)
Where should the calculated values be stored? @j14159 suggested ETS, with the drawback that it would potentially be modifiable outside of the module
Would we allow side effects in values? OCaml, for example, allows this:

let _ =
  print_string "Hello\n";;

Suggestion: Optional arity for exports

It often is frustrating to update arity for changed functions' signatures. While there are definitely cases when one would keep some functions out from being exported, often times the need to maintain arity in export becomes a nuisance.

My proposal is to make arity specification in export optional, and if it isn't specified, export all functions with a given name, regardless of arity.

Programming to an interface/signature and default implementations in modules

This issue is for a discussion and collection of ideas at least related to modules and signatures.

In issue #87 the distinction between OCaml's open and include came up and how the former exposes the imported module's functions in the module doing the opening. Copied from that issue:

My basic opinion right now: I see value in default implementations of signatures/interfaces but would like to consider more specificity than open appears to provide. Thoughts? Links to papers most definitely appreciated :D

Sequences of bindings for `let ... in`

Rather than:

let add x y = x + y in
let square x = x * x in
add 2 (square 3)

I'd like to be able to do something like

let
  add x y = x + y;
  square x = x * x
in add 2 (square 3)

Multiple `_` in a pattern causes a compiler error

E.g.

g tuple =
  match tuple with
    (_, _, x) -> x

The instances of _ aren't being renamed and this should be relatively easy to fix in mlfe_ast_gen.erl.

Double line breaks in expressions fail to compile

It would be desirable to be able to split code up with several line breaks. This is an example where double line breaks \n\n are significant and cause a compilation error.

alpaca:compile({text, [<<"module a \n\nf a = let add x = x + x in\n\nadd a a\n\n">>]}).

Automatic code formatter

One of the things I'm really enjoying about Elm is the official style guide and automatic code formatting tools. One can write code in any sloppy style, hit save, and then the formatter will rewrite the file in the correct style.

This allows us to avoid extra typing and removes all code style squabbling between developers! Hooray! It'd be great to have this for Alpaca.

I believe this is generally done by converting source code into AST, and then pretty-printing that back. We could leverage the compiler for the parsing/AST generation if there is a public function to do this in the compiler source code.

Cheers,
Louis

Compiler doesn't generate module_info/0,1 functions

This is easily spotted when trying to TAB-complete functions available in an mlfe module:

4> M1.
{compiled_module,basic_adt,"basic_adt.beam",
                 <<70,79,82,49,0,0,1,176,66,69,65,77,65,116,111,109,0,0,0,
                   36,0,0,0,6,9,...>>}
5> code:load_binary(element(2, M1), element(3, M1), element(4, M1)).
{module,basic_adt}
6> basic_adt:<tried hitting TAB here>*** ERROR: Shell process terminated! ***

=ERROR REPORT==== 1-Jul-2016::11:41:45 ===
Error in process <0.62.0> with exit value:
{undef,[{basic_adt,module_info,[],[]},
        {edlin_expand,expand_function_name,2,
                      [{file,"edlin_expand.erl"},{line,54}]},
        {group,get_line1,4,[{file,"src/4.1.1/group.erl"},{line,568}]},
        {group,get_chars_loop,8,[{file,"src/4.1.1/group.erl"},{line,462}]},
        {group,io_request,5,[{file,"src/4.1.1/group.erl"},{line,181}]},
        {group,server_loop,3,[{file,"src/4.1.1/group.erl"},{line,117}]}]}
Eshell V7.2  (abort with ^G)
1>

M:module_info/1,2 are actually thin shims which call erlang:get_module_info/1,2 which both already work with mlfe modules. I can provide a PR implementing generation of the shims - what do you think?

Reuse variable in pattern matches for equality

In Erlang we can do the following to express two tuple items that are equal:

case Tuple of
    {X, X} -> ...
end

In Alpaca we currently have to do this:

match tuple with
  (x, y), x == y -> ...

I'd much prefer for us to be able to be like Erlang here:

match tuple with
  (x, x) -> ...

The latter seems like a more expressive form to me. This should be relatively simple to solve in the AST generation stage by rewriting multiple occurrences of a symbol in a pattern to a sequence of synthesized names and some added equality guards.

mlfe:file/{1,2}

Mirroring compile:file{1,2}, there ought to be mlfe:file/{1,2}. I intend to add this as soon as time permits.

Rename call_erlang keyword

Hello!

This keyword seems oddly named as it seems one would use it to call any BEAM language, not just Erlang. Seems odd to use this name to call Elixir or LFE.

Alternatives:

call
call_native
call_unsafe
call_beam

Exception AST nodes don't get their variables renamed

As part of AST rewriting before typing occurs the Alpaca compiler renames each variable in order to ensure uniqueness and that nothing escapes from things like receives. Arguments to throw, error, and exit aren't being correctly renamed at the moment.

Support infix functions

It would be great if we could define infix functions, such as the usual suspects (>>=), <*> etc., and even |> can be implemented this way as a function (such as it is in Ocaml and Elm) instead of hardcoding it in the Elixir manner.

I had a bit of a go at hacking on the parser to allow for infix definitions and had some partial success: master...lepoetemaudit:master but it fell over when parsing any usage of the symbols as a function (it expects 'symbols' which do not include the operators, and this approach was a really inelegant hack to begin with).

I went and looked at the Elm source, which as far as I can tell distinguishes between normal symbols and operator only symbols and realised that to implement this properly we'd probably need to do the same, i.e. remove all the hardcoded infix operators in the lexing phase (and the binary << and >> delimiters are problematic too - perhaps python's b"" style of binary string quoting could be used instead?). Then we could have 'operator' strings defined in the lexer which can go into the environment map as functions, and are recognisable at definition and in usage as infix. It would be fairly easy to either provide the existing math ones as 'built-ins' that wrap the Erlang ones, or switch them out directly as of now in the parser.

I'm willing to give this a go if it is of interest. I'm in awe of the potential of having an ML on the Erlang VM.

Suggestions for a name

Let's collect some suggestions for a name.

Peter Landin -> lander
Robin Milner -> ermil
Tarm (birthplace of Agner Krarup Erlang) plus ML -> tarmel or tarml
ML + erl -> merl, but merl already is a known Erlang module
lambda calculus -> lama

Automatic currying support

Taken from the discussion in issue #56 with @danabr and @lepoetemaudit

Single versions of a function will be automatically curried so the following will work:

foo x y = x + y

curry_foo () = 2 |> foo 1  -- results in 3

When there are different versions of the same named function (differing in arity), we will halt typing with an error in any ambiguous case. For example the following would generate an error along the lines of {error, {ambiguous_application, foo/1, foo/2}} - but maybe less hostile than that :)

foo x = x + x

foo x y = x + y

make_an_error () = 1 |> foo

While the following would not:

foo x = x + x

foo x y z = x + y + z

-- unit -> (int -> int)
passes_typing () = 2 |> foo 1

Feedback and differing opinions welcome. I think basic expression application as discussed in #56 has to be addressed before this issue is.

Patterns in function definitions

Given:

type option 'a  = None | Some 'a

Instead of

map opt f = match opt with
    None -> opt
  | Some x -> Some (f x)

I want to be able to write

map None _ = None

map (Some x) f = Some (f x)

This will require reworking bits of the parser as terms can nest simple expressions (simple_expr) in parens making things like pattern matches legitimate terms. We'll need something of a distinction in order to make a list of patterns explicitly that and rule out the occurrence of a match or something equally nonsensical in a function declaration.

Empty lines of whitespace more than `\n` not considered a break

An empty line between function declarations is necessary to distinguish the two as separate functions but if that empty line contains any whitespace more than a \n it's not treated as an empty line/break between function definitions.

Fails to compile when applying the result of a function

When a function returns a function, it cannot be applied (in MLFE - from Erlang, the resulting function can be called).

module curry_fun

export curried/1

curried arg1 =
  let curried2 arg2 =
    let curried3 arg3 = 
      arg1 + arg2 + arg3
    in curried3  
  in
    curried2

From Erlang, it works fine:

2> (((curry_fun:curried(1))(2))(3)).
6

But in MLFE, this won't compile: curried 1 2 3 resulting in a compiler error:

mlfe_typer:type_modules/2 crashed with error:{error,
                                              {arity_error,curry_fun,11}}

Using parens (to prevent what I presume is greedy application above) produces a different error:

v = ((curried 1) 2) 3

-------

mlfe_typer:type_modules/2 crashed with error:function_clause

Type checker does not check polymorphic user defined types

The following module should not pass type checking, but it does:

module tree

export height/1, fail/1

type tree 'a = Leaf | Node (tree, 'a, tree)

height t = 
  match t with
    Leaf -> 0
  | Node (l, _, r) -> 1 + (max (height l) (height r)) 

max a b = 
  match (a > b) with
    true -> a
  | false -> b

fail () = height 1

scanner.erl

I think scanner.erl should be renamed because it's very likely to clash with an already existing scanner.beam. It looks internal, so how about renaming to mlfe_scanner. Alternatively, and as a way to denote internal modules, maybe we should use mlfe_prv_scanner and probably rename other internal modules as well.