kindelia / hvm Goto Github PK
View Code? Open in Web Editor NEWA massively parallel, optimal functional runtime in Rust
Home Page: https://higherorderco.com
License: Apache License 2.0
A massively parallel, optimal functional runtime in Rust
Home Page: https://higherorderco.com
License: Apache License 2.0
HVM's runtime only handles direct, non nested matches. So, for example, this is a valid rule:
(Foo (Bar xs)) = (A xs)
But this is not:
(Foo (Bar (Cons x xs))) = (A (Cons x xs))
Because there is a nested match. In order for these cases to work, we must split the left-hand of nested patterns, into many non nested equations. Currently, the language has a flattener that is limited, and will fail in some cases. For example, the function below:
Fn (Bar (Cons ...)) = A
Fn (Bar Nil) = B
Fn x = C
Should be compiled to:
Fn (Bar _0) = (Fn.0 _0)
(Fn.0 (Cons x y)) = A
(Fn.0 Nil) = B
(Fn.0 x) = C
Fn x = C
But the current flattener can't deal with that.
Consider the program calculating sum (map head (tails list))
. Due to the laziness, though the normal form of tails list
has O(n^2)
elements, I'm expecting the whole expression to cost linear time since we don't need to fully evaluate each lazy lists, and the number of elements we actually traversed are just n
.
But unfortunately, HVM cost quadaric time & space on this scenario, indicating it's not lazy on sub-lists and all sub-lists elements are actually generated.
HVM:
(Gen 0) = Nil
(Gen n) = (Cons n (Gen (- n 1)))
(Foldr f z Nil) = z
(Foldr f z (Cons x xs)) = (f x (Foldr f z xs))
(Tails Nil) = (Cons Nil Nil)
(Tails (Cons x xs)) = (Cons xs (Tails xs))
(Map f Nil) = Nil
(Map f (Cons x xs)) = (Cons (f x) (Map f xs))
(Head Nil) = 0
(Head (Cons x xs)) = x
(Main n) = (Foldr @a@b(+ a b) 0 (Map @l(Head l) (Tails (Gen n))))
Haskell:
import Prelude hiding (head, foldr, tails, map)
import Data.Word (Word32)
import System.Environment (getArgs)
data List a = Nil | Cons a (List a)
gen 0 = Nil
gen n = Cons n (gen (n - 1))
foldr f z Nil = z
foldr f z (Cons x xs) = f x (foldr f z xs)
tails Nil = Cons Nil Nil
tails (Cons x xs) = Cons xs (tails xs)
map f Nil = Nil
map f (Cons x xs) = Cons (f x) (map f xs)
head Nil = 0
head (Cons x xs) = x
main = do
n <- read . (!! 0) <$> getArgs :: IO Word32
print $ foldr (\a b -> a + b) 0 (map (\l -> head l) (tails (gen n)))
Run time:
n | HVM (clang, O2, single thread) | GHC, O2 |
---|---|---|
1000 | 131ms | 10ms |
2000 | 316ms | 10ms |
4000 | 1082ms | 12ms |
8000 | 5.039s (1.6G mem) | 15ms (5.0M mem) |
16000 | 31.018s (5.9G mem) | 20ms (5.6M mem) |
I'd think it's a serious issue since it's quite a common pattern to generate lazy data structures and share their parts everywhere (eg. B-Tree, Finger Tree).
I've just noticed the recently "kickstartered" SimulaVR "virtual reality computer". Which is cool on its own, but what's cooler is the UX software is written in Haskell.
They seem to plan to make a second version of the SW which they still seem to contemplate the design & workings of. Wouldn't it be cool to write this next gen UX SW in something which targets HVM (e.g. Kind)?
It didn't seem to me there is extreme pressure for the next gen version which should rather be based on the experience & usage data gathered from the current version which on its own will take months at least.
@georgewsinger if you find some time, feel free to chime in here 😉.
ToT (61a331f) doesn't appear to compile with gcc version 11.2.0 (from Ubuntu 21.10)
$ hvm c bench/Fibonacci/main.hvm
Compiled to 'bench/Fibonacci/main.c'.
$ gcc -O2 bench/Fibonacci/main.c
bench/Fibonacci/main.c:932:5: error: variably modified ‘normal_seen_data’ at file scope
932 | u64 normal_seen_data[NORMAL_SEEN_MCAP];
| ^~~~~~~~~~~~~~~~
Digging in I see
#define NORMAL_SEEN_MCAP (HEAP_SIZE/sizeof(u64)/(sizeof(u64)*8))
const u64 HEAP_SIZE = 8 * U64_PER_GB * sizeof(u64);
The following program is accepted, runs and outputs (Foo 0 (Succ (Zero)))
(Foo Zero) = None
(Foo a (Succ pred)) = (Some a)
(Main) = (Foo 0 (Succ Zero))
I expected it to give an error because of the inconsistent number of parameters or for the output to be (Some 0)
.
On the other hand this produces an error
(Foo Zero) = None
(Foo a (Succ pred)) = (Some pred)
(Main) = (Foo 0 (Succ Zero))
I was implementing some lambda-encoded list functions when I ran across this error. Running the following code;
Pair = λx λy λp (p x y)
Bimap = λf λg λp λq (p (λx λy (q (f x) (g y))))
Uncurry = λf λp (p (λx λy (f x y)))
Inl = λx λl λr (l x)
Inr = λx λl λr (r x)
Nil = ((Inl) (λx x))
Cons = λx λxs ((Inr) ((Pair) x xs))
Y = (λ f ((λ x (f (x x))) (λ x (f (x x)))))
// ListFMap : (X -> Y) -> 1 + A x X -> 1 + A x Y
ListFMap = λf λlf λl λr (lf l (λp (r ((Bimap) (λx x) f p))))
// ListCata : (1 + A x S -> S) -> [A] -> S
ListCata = λa ((Y) (λf λl (a ((ListFMap) f l))))
// ListAlg : B -> (A -> B -> B) -> (1 + A x B -> B)
ListAlg = λn λr λlf (lf (λu n) ((Uncurry) r))
Foldr = λn λr ((ListCata) ((ListAlg) n r))
Append = λl1 λl2 ((Foldr) l2 Cons l1)
Concat = ((Foldr) Nil Append)
Main = ((Concat) ((Cons) ((Cons) 1 Nil) Nil))
causes the following error;
Reducing.
Expected Term:
0 | λ_ λx1 (x1 λx2 ((x2 1) (^19 ^60)))
thread 'main' panicked at 'No parse.'
with a red underline between ^19
and ^60
.
I modified the example program:
// Creates a tree with `2^n` elements
(Gen 0) = (Leaf 1)
(Gen n) = (Node (Gen(- n 1)) (Gen(- n 1)))
// Adds all elements of a tree
(Sum (Leaf x)) = x
(Sum (Node a b)) = (+ ((+ (Sum a) (Sum b))) ((- (Sum a) (Sum b))))
// Performs 2^n additions in parallel
(Main n) = (Sum (Gen n))
It often outputs 256, but sometimes outputs an expression. The expressions seem to be different:
$ time ./main 8
Rewrites: 304812 (2.60 MR/s).
Mem.Size: 198923 words.
((((64+((32+((16+((8+((4+((((1+(Sum 1))+(1-(Sum 1)))+2)+(((1+(Sum 1))+(1-(Sum 1)))-2)))+(4-((((1+(Sum 1))+(1-(Sum 1)))+2)+(((1+(Sum 1))+(1-(Sum 1)))-2)))))+(8-((4+((((1+(Sum 1))+(1-(Sum 1)))+2)+(((1+(Sum 1))+(1-(Sum 1)))-2)))+(4-((((1+(Sum 1))+(1-(Sum 1)))+2)+(((1+(Sum 1))+(1-(Sum 1)))-2)))))))+(16-((8+((4+((((1+(Sum 1))+(1-(Sum 1)))+2)+(((1+(Sum 1))+(1-(Sum 1)))-2)))+(4-((((1+(Sum 1))+(1-(Sum 1)))+2)+(((1+(Sum 1))+(1-(Sum 1)))-2)))))+(8-((4+((((1+(Sum 1))+(1-(Sum 1)))+2)+(((1+(Sum 1))+(1-(Sum 1)))-2)))+(4-((((1+(Sum 1))+(1-(Sum 1)))+2)+(((1+(Sum 1))+(1-(Sum 1)))-2)))))))))+(32-((16+((8+((4+((((1+(Sum 1))+(1-(Sum 1)))+2)+(((1+(Sum 1))+(1-(Sum 1)))-2)))+(4-((((1+(Sum 1))+(1-(Sum 1)))+2)+(((1+(Sum 1))+(1-(Sum 1)))-2)))))+(8-((4+((((1+(Sum 1))+(1-(Sum 1)))+2)+(((1+(Sum 1))+(1-(Sum 1)))-2)))+(4-((((1+(Sum 1))+(1-(Sum 1)))+2)+(((1+(Sum 1))+(1-(Sum 1)))-2)))))))+(16-((8+((4+((((1+(Sum 1))+(1-(Sum 1)))+2)+(((1+(Sum 1))+(1-(Sum 1)))-2)))+(4-((((1+(Sum 1))+(1-(Sum 1)))+2)+(((1+(Sum 1))+(1-(Sum 1)))-2)))))+(8-((4+((((1+(Sum 1))+(1-(Sum 1)))+2)+(((1+(Sum 1))+(1-(Sum 1)))-2)))+(4-((((1+(Sum 1))+(1-(Sum 1)))+2)+(((1+(Sum 1))+(1-(Sum 1)))-2)))))))))))+(64-((32+((16+((8+((4+((((1+(Sum 1))+(1-(Sum 1)))+2)+(((1+(Sum 1))+(1-(Sum 1)))-2)))+(4-((((1+(Sum 1))+(1-(Sum 1)))+2)+(((1+(Sum 1))+(1-(Sum 1)))-2)))))+(8-((4+((((1+(Sum 1))+(1-(Sum 1)))+2)+(((1+(Sum 1))+(1-(Sum 1)))-2)))+(4-((((1+(Sum 1))+(1-(Sum 1)))+2)+(((1+(Sum 1))+(1-(Sum 1)))-2)))))))+(16-((8+((4+((((1+(Sum 1))+(1-(Sum 1)))+2)+(((1+(Sum 1))+(1-(Sum 1)))-2)))+(4-((((1+(Sum 1))+(1-(Sum 1)))+2)+(((1+(Sum 1))+(1-(Sum 1)))-2)))))+(8-((4+((((1+(Sum 1))+(1-(Sum 1)))+2)+(((1+(Sum 1))+(1-(Sum 1)))-2)))+(4-((((1+(Sum 1))+(1-(Sum 1)))+2)+(((1+(Sum 1))+(1-(Sum 1)))-2)))))))))+(32-((16+((8+((4+((((1+(Sum 1))+(1-(Sum 1)))+2)+(((1+(Sum 1))+(1-(Sum 1)))-2)))+(4-((((1+(Sum 1))+(1-(Sum 1)))+2)+(((1+(Sum 1))+(1-(Sum 1)))-2)))))+(8-((4+((((1+(Sum 1))+(1-(Sum 1)))+2)+(((1+(Sum 1))+(1-(Sum 1)))-2)))+(4-((((1+(Sum 1))+(1-(Sum 1)))+2)+(((1+(Sum 1))+(1-(Sum 1)))-2)))))))+(16-((8+((4+((((1+(Sum 1))+(1-(Sum 1)))+2)+(((1+(Sum 1))+(1-(Sum 1)))-2)))+(4-((((1+(Sum 1))+(1-(Sum 1)))+2)+(((1+(Sum 1))+(1-(Sum 1)))-2)))))+(8-((4+((((1+(Sum 1))+(1-(Sum 1)))+2)+(((1+(Sum 1))+(1-(Sum 1)))-2)))+(4-((((1+(Sum 1))+(1-(Sum 1)))+2)+(((1+(Sum 1))+(1-(Sum 1)))-2))))))))))))+128)+0)
real 0m0.146s
user 0m0.052s
sys 0m0.111s
Rewrites: 305163 (2.14 MR/s).
Mem.Size: 200042 words.
(256+(((64+((((16+((((((((1+(Sum 1))+(1-(Sum 1)))+2)+(((1+(Sum 1))+(1-(Sum 1)))-2))+((((1+(Sum 1))+(1-(Sum 1)))+2)+(((1+(Sum 1))+(1-(Sum 1)))-2)))+(((((1+(Sum 1))+(1-(Sum 1)))+2)+(((1+(Sum 1))+(1-(Sum 1)))-2))-((((1+(Sum 1))+(1-(Sum 1)))+2)+(((1+(Sum 1))+(1-(Sum 1)))-2))))+8)+(((((((1+(Sum 1))+(1-(Sum 1)))+2)+(((1+(Sum 1))+(1-(Sum 1)))-2))+((((1+(Sum 1))+(1-(Sum 1)))+2)+(((1+(Sum 1))+(1-(Sum 1)))-2)))+(((((1+(Sum 1))+(1-(Sum 1)))+2)+(((1+(Sum 1))+(1-(Sum 1)))-2))-((((1+(Sum 1))+(1-(Sum 1)))+2)+(((1+(Sum 1))+(1-(Sum 1)))-2))))-8)))+(16-((((((((1+(Sum 1))+(1-(Sum 1)))+2)+(((1+(Sum 1))+(1-(Sum 1)))-2))+((((1+(Sum 1))+(1-(Sum 1)))+2)+(((1+(Sum 1))+(1-(Sum 1)))-2)))+(((((1+(Sum 1))+(1-(Sum 1)))+2)+(((1+(Sum 1))+(1-(Sum 1)))-2))-((((1+(Sum 1))+(1-(Sum 1)))+2)+(((1+(Sum 1))+(1-(Sum 1)))-2))))+8)+(((((((1+(Sum 1))+(1-(Sum 1)))+2)+(((1+(Sum 1))+(1-(Sum 1)))-2))+((((1+(Sum 1))+(1-(Sum 1)))+2)+(((1+(Sum 1))+(1-(Sum 1)))-2)))+(((((1+(Sum 1))+(1-(Sum 1)))+2)+(((1+(Sum 1))+(1-(Sum 1)))-2))-((((1+(Sum 1))+(1-(Sum 1)))+2)+(((1+(Sum 1))+(1-(Sum 1)))-2))))-8))))+32)+(((16+((((((((1+(Sum 1))+(1-(Sum 1)))+2)+(((1+(Sum 1))+(1-(Sum 1)))-2))+((((1+(Sum 1))+(1-(Sum 1)))+2)+(((1+(Sum 1))+(1-(Sum 1)))-2)))+(((((1+(Sum 1))+(1-(Sum 1)))+2)+(((1+(Sum 1))+(1-(Sum 1)))-2))-((((1+(Sum 1))+(1-(Sum 1)))+2)+(((1+(Sum 1))+(1-(Sum 1)))-2))))+8)+(((((((1+(Sum 1))+(1-(Sum 1)))+2)+(((1+(Sum 1))+(1-(Sum 1)))-2))+((((1+(Sum 1))+(1-(Sum 1)))+2)+(((1+(Sum 1))+(1-(Sum 1)))-2)))+(((((1+(Sum 1))+(1-(Sum 1)))+2)+(((1+(Sum 1))+(1-(Sum 1)))-2))-((((1+(Sum 1))+(1-(Sum 1)))+2)+(((1+(Sum 1))+(1-(Sum 1)))-2))))-8)))+(16-((((((((1+(Sum 1))+(1-(Sum 1)))+2)+(((1+(Sum 1))+(1-(Sum 1)))-2))+((((1+(Sum 1))+(1-(Sum 1)))+2)+(((1+(Sum 1))+(1-(Sum 1)))-2)))+(((((1+(Sum 1))+(1-(Sum 1)))+2)+(((1+(Sum 1))+(1-(Sum 1)))-2))-((((1+(Sum 1))+(1-(Sum 1)))+2)+(((1+(Sum 1))+(1-(Sum 1)))-2))))+8)+(((((((1+(Sum 1))+(1-(Sum 1)))+2)+(((1+(Sum 1))+(1-(Sum 1)))-2))+((((1+(Sum 1))+(1-(Sum 1)))+2)+(((1+(Sum 1))+(1-(Sum 1)))-2)))+(((((1+(Sum 1))+(1-(Sum 1)))+2)+(((1+(Sum 1))+(1-(Sum 1)))-2))-((((1+(Sum 1))+(1-(Sum 1)))+2)+(((1+(Sum 1))+(1-(Sum 1)))-2))))-8))))-32)))+(64-((((16+((((((((1+(Sum 1))+(1-(Sum 1)))+2)+(((1+(Sum 1))+(1-(Sum 1)))-2))+((((1+(Sum 1))+(1-(Sum 1)))+2)+(((1+(Sum 1))+(1-(Sum 1)))-2)))+(((((1+(Sum 1))+(1-(Sum 1)))+2)+(((1+(Sum 1))+(1-(Sum 1)))-2))-((((1+(Sum 1))+(1-(Sum 1)))+2)+(((1+(Sum 1))+(1-(Sum 1)))-2))))+8)+(((((((1+(Sum 1))+(1-(Sum 1)))+2)+(((1+(Sum 1))+(1-(Sum 1)))-2))+((((1+(Sum 1))+(1-(Sum 1)))+2)+(((1+(Sum 1))+(1-(Sum 1)))-2)))+(((((1+(Sum 1))+(1-(Sum 1)))+2)+(((1+(Sum 1))+(1-(Sum 1)))-2))-((((1+(Sum 1))+(1-(Sum 1)))+2)+(((1+(Sum 1))+(1-(Sum 1)))-2))))-8)))+(16-((((((((1+(Sum 1))+(1-(Sum 1)))+2)+(((1+(Sum 1))+(1-(Sum 1)))-2))+((((1+(Sum 1))+(1-(Sum 1)))+2)+(((1+(Sum 1))+(1-(Sum 1)))-2)))+(((((1+(Sum 1))+(1-(Sum 1)))+2)+(((1+(Sum 1))+(1-(Sum 1)))-2))-((((1+(Sum 1))+(1-(Sum 1)))+2)+(((1+(Sum 1))+(1-(Sum 1)))-2))))+8)+(((((((1+(Sum 1))+(1-(Sum 1)))+2)+(((1+(Sum 1))+(1-(Sum 1)))-2))+((((1+(Sum 1))+(1-(Sum 1)))+2)+(((1+(Sum 1))+(1-(Sum 1)))-2)))+(((((1+(Sum 1))+(1-(Sum 1)))+2)+(((1+(Sum 1))+(1-(Sum 1)))-2))-((((1+(Sum 1))+(1-(Sum 1)))+2)+(((1+(Sum 1))+(1-(Sum 1)))-2))))-8))))+32)+(((16+((((((((1+(Sum 1))+(1-(Sum 1)))+2)+(((1+(Sum 1))+(1-(Sum 1)))-2))+((((1+(Sum 1))+(1-(Sum 1)))+2)+(((1+(Sum 1))+(1-(Sum 1)))-2)))+(((((1+(Sum 1))+(1-(Sum 1)))+2)+(((1+(Sum 1))+(1-(Sum 1)))-2))-((((1+(Sum 1))+(1-(Sum 1)))+2)+(((1+(Sum 1))+(1-(Sum 1)))-2))))+8)+(((((((1+(Sum 1))+(1-(Sum 1)))+2)+(((1+(Sum 1))+(1-(Sum 1)))-2))+((((1+(Sum 1))+(1-(Sum 1)))+2)+(((1+(Sum 1))+(1-(Sum 1)))-2)))+(((((1+(Sum 1))+(1-(Sum 1)))+2)+(((1+(Sum 1))+(1-(Sum 1)))-2))-((((1+(Sum 1))+(1-(Sum 1)))+2)+(((1+(Sum 1))+(1-(Sum 1)))-2))))-8)))+(16-((((((((1+(Sum 1))+(1-(Sum 1)))+2)+(((1+(Sum 1))+(1-(Sum 1)))-2))+((((1+(Sum 1))+(1-(Sum 1)))+2)+(((1+(Sum 1))+(1-(Sum 1)))-2)))+(((((1+(Sum 1))+(1-(Sum 1)))+2)+(((1+(Sum 1))+(1-(Sum 1)))-2))-((((1+(Sum 1))+(1-(Sum 1)))+2)+(((1+(Sum 1))+(1-(Sum 1)))-2))))+8)+(((((((1+(Sum 1))+(1-(Sum 1)))+2)+(((1+(Sum 1))+(1-(Sum 1)))-2))+((((1+(Sum 1))+(1-(Sum 1)))+2)+(((1+(Sum 1))+(1-(Sum 1)))-2)))+(((((1+(Sum 1))+(1-(Sum 1)))+2)+(((1+(Sum 1))+(1-(Sum 1)))-2))-((((1+(Sum 1))+(1-(Sum 1)))+2)+(((1+(Sum 1))+(1-(Sum 1)))-2))))-8))))-32))))-128))
real 0m0.164s
user 0m0.045s
sys 0m0.130s
The Fold
:
// Folds over a list
(Fold Nil c n) = n
(Fold (Cons x xs) c n) = (c x (Fold xs c n))
The 2
:
let g = λf(λx(f (f x)))
(g g)
How is it possible that (g g)
is a no-no, but the Fold
, supposedly, is able to be used twice?
P.S.: Are the following scenarios possible and correct:
(Fold lst (\x y. ... (Fold ...) ...) 5)
;(Fold lst Plus (... (Fold ...) ...))
?The limitation seem to be there to prevent 2 non-annihilating Dup
nodes to go one through another and becoming 4, isn't it?
Compiling this:
(Main) = (* 2147483648 2147483648)
Outputs this C code:
case _MAIN_: {
if (1) {
inc_cost(mem);
u64 ret_0;
if (get_tag(U_32(2147483648)) == U32 && get_tag(U_32(2147483648)) == U32) {
ret_0 = U_32(get_val(U_32(2147483648)) * get_val(U_32(2147483648)));
inc_cost(mem);
} else {
u64 op2_1 = alloc(mem, 2);
link(mem, op2_1 + 0, U_32(2147483648));
link(mem, op2_1 + 1, U_32(2147483648));
ret_0 = Op2(MUL, op2_1);
}
u64 done = ret_0;
link(mem, host, done);
clear(mem, get_loc(term, 0), 0);
init = 1;
continue;
}
break;
};
Where
ret_0 = U_32(get_val(U_32(2147483648)) * get_val(U_32(2147483648)));
Result of this multiplication can be more than 32 bits. This can cause type confusion and code execution if you were to run arbitrary hvm code.
same goes for (<< 3 60)
As of commit 22b0019 and using main.hovm from the readme as the input source, running the compilation step:
clang -O2 main.hovm.out.c -o main.hovm.out
fails with:
main.hovm.out.c:964:23: warning: incompatible pointer to integer conversion assigning to 'Thd' (aka 'unsigned long') from 'void *' [-Wint-conversion]
workers[t].thread = NULL;
^ ~~~~
1 warning generated.
/nix/store/cdm6zywd51mbabxhklsixwcskv4n70s3-binutils-2.35.2/bin/ld: /tmp/main-a67557.o: in function `ffi_normal':
main.hovm.out.c:(.text+0x32fb): undefined reference to `pthread_create'
/nix/store/cdm6zywd51mbabxhklsixwcskv4n70s3-binutils-2.35.2/bin/ld: main.hovm.out.c:(.text+0x3311): undefined reference to `pthread_create'
/nix/store/cdm6zywd51mbabxhklsixwcskv4n70s3-binutils-2.35.2/bin/ld: main.hovm.out.c:(.text+0x3327): undefined reference to `pthread_create'
/nix/store/cdm6zywd51mbabxhklsixwcskv4n70s3-binutils-2.35.2/bin/ld: main.hovm.out.c:(.text+0x333d): undefined reference to `pthread_create'
/nix/store/cdm6zywd51mbabxhklsixwcskv4n70s3-binutils-2.35.2/bin/ld: main.hovm.out.c:(.text+0x3353): undefined reference to `pthread_create'
/nix/store/cdm6zywd51mbabxhklsixwcskv4n70s3-binutils-2.35.2/bin/ld: /tmp/main-a67557.o:main.hovm.out.c:(.text+0x3369): more undefined references to `pthread_create' follow
clang-7: error: linker command failed with exit code 1 (use -v to see invocation)
Looking in main.hovm.out.c, I can see that there's indeed a call to a function pthread_create
on line 971, but this function doesn't seem to be defined anywhere in the rest of the file.
System info:
$ clang --version
clang version 7.1.0 (tags/RELEASE_710/final)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /nix/store/ass1sf1bx07qvlrg02nymxnyzp1cpxz7-clang-7.1.0/bin
$ uname -a
Linux a 5.15.16_1 #1 SMP Thu Jan 20 17:19:29 UTC 2022 x86_64 GNU/Linux
Are there any other requirements aside from clang that I'm supposed to have? I also tried installing llvm and lld, but these didn't help.
Hi,
First of all, implementing evaluation under lambdas is really cool! Thank you.
Second, could you please add some references or links to either HOW.md or a new Background.md document? I see that you mentioned some links in issue #60:
Right now the src/runtime.c
template is not a valid C file before the substitutions are made. This hardens developing on that file unnecessarily.
Some solutions to this are discussed on #32, e.g.
Sure, or variations of int value = /* START / 0 / END */; expressions where it is important, and replace both tokens and the content they box.
Cloning the repo and doing nix build
has the same result.
nix develop
also errors out with the same error.
$ nix build github:Kindelia/HVM
error: builder for '/nix/store/bgqdhks6dikjfbqmhc7iknx5pflgiar0-hvm-crate2nix.drv' failed with exit code 3;
last 10 log lines:
> drwxr-xr-x 11 nixbld nixbld 4096 Jan 1 1970 bench
> -rw-r--r-- 1 nixbld nixbld 320 Jan 1 1970 default.nix
> drwxr-xr-x 2 nixbld nixbld 4096 Jan 1 1970 examples
> -rw-r--r-- 1 nixbld nixbld 3408 Jan 1 1970 flake.lock
> -rw-r--r-- 1 nixbld nixbld 2961 Jan 1 1970 flake.nix
> -rwxr-xr-x 1 nixbld nixbld 624 Jan 1 1970 prep.sh
> -rw-r--r-- 1 nixbld nixbld 62 Jan 1 1970 rustfmt.toml
> -rw-r--r-- 1 nixbld nixbld 318 Jan 1 1970 shell.nix
> drwxr-xr-x 2 nixbld nixbld 4096 Jan 1 1970 src
> == ls -la (END)
For full logs, run 'nix log /nix/store/bgqdhks6dikjfbqmhc7iknx5pflgiar0-hvm-crate2nix.drv'.
(use '--show-trace' to show detailed location information)
The output of nix log /nix/store/bgqdhks6dikjfbqmhc7iknx5pflgiar0-hvm-crate2nix.drv
is:
@nix { "action": "setPhase", "phase": "unpackPhase" }
unpacking sources
unpacking source archive /nix/store/h3g8skxxp8kazfzk8bbk92v8bryx3wfv-rh2xy9kpb76jc79mlc7gs83qxqhgfras-source
source root is rh2xy9kpb76jc79mlc7gs83qxqhgfras-source
@nix { "action": "setPhase", "phase": "buildPhase" }
building
++ crate2nix generate -f ./Cargo.toml -o Cargo-generated.nix -h /nix/store/rk2yq19vkhgx09fswn2gsm2zyzgmfy34-hvm-crate2nix/crate-hashes.json
Error: while retrieving metadata about ./Cargo.toml: `cargo metadata` exited with an error: error: the lock file /build/rh2xy9kpb76jc79mlc7gs83qxqhgfras-source/Cargo.lock needs to be updated but --locked was passed to prevent this
If you want to try to generate the lock file without accessing the network, remove the --locked flag and use --offline instead.
crate2nix failed.
== cargo/config (BEGIN)
[source.crates-io]
replace-with = "vendored-sources"
[source.vendored-sources]
directory = "/nix/store/5wivq9n9cy4fpcywkiir70p7bkkkabyc-deps"
== cargo/config (END)
== crate-hashes.json (BEGIN)
{}
== crate-hashes.json (END)
== ls -la (BEGIN)
total 132
drwxr-xr-x 7 nixbld nixbld 4096 Jan 1 1970 .
drwx------ 3 nixbld nixbld 4096 Mar 31 21:03 ..
drwxr-xr-x 3 nixbld nixbld 4096 Jan 1 1970 .github
-rw-r--r-- 1 nixbld nixbld 97 Jan 1 1970 .gitignore
drwxr-xr-x 2 nixbld nixbld 4096 Jan 1 1970 .vscode
-rw-r--r-- 1 nixbld nixbld 266 Jan 1 1970 BUILDING.md
-rw-r--r-- 1 nixbld nixbld 8695 Jan 1 1970 Cargo.lock
-rw-r--r-- 1 nixbld nixbld 612 Jan 1 1970 Cargo.toml
-rw-r--r-- 1 nixbld nixbld 35535 Jan 1 1970 HOW.md
-rw-r--r-- 1 nixbld nixbld 1068 Jan 1 1970 LICENSE
-rw-r--r-- 1 nixbld nixbld 802 Jan 1 1970 NIX.md
-rw-r--r-- 1 nixbld nixbld 10783 Jan 1 1970 README.md
drwxr-xr-x 11 nixbld nixbld 4096 Jan 1 1970 bench
-rw-r--r-- 1 nixbld nixbld 320 Jan 1 1970 default.nix
drwxr-xr-x 2 nixbld nixbld 4096 Jan 1 1970 examples
-rw-r--r-- 1 nixbld nixbld 3408 Jan 1 1970 flake.lock
-rw-r--r-- 1 nixbld nixbld 2961 Jan 1 1970 flake.nix
-rwxr-xr-x 1 nixbld nixbld 624 Jan 1 1970 prep.sh
-rw-r--r-- 1 nixbld nixbld 62 Jan 1 1970 rustfmt.toml
-rw-r--r-- 1 nixbld nixbld 318 Jan 1 1970 shell.nix
drwxr-xr-x 2 nixbld nixbld 4096 Jan 1 1970 src
== ls -la (END)
I want to make an interpreter - so it can fail with a trace of what it tried to do, and one I can play with.
I also want to make a HM-like typechecker (probably with some extensions).
For that, I need a formal definition of whats going on. This looks like Landin's INs, but with "superposition nodes", which - as I understand it - is a good way to not have an oracle which ruins everything. I kinda understand how they work from HOW.md, but not quite.
How are the s
and r
from {s r}
linked to \x.
, where s
and r
are clones of b
from \x.b
?
I also don't get how Era
nodes reduce. Do they replace everything they go in with Era
nodes, while deleting returning themselves to freelist?
In the same vein as Wasm, adding a simple binary format might make HVM an easier/more compact target for language backends. Startup times may also be faster, as there's no need to parse unicode and build up the syntax tree. Thoughts?
As an aside, are you planning on documenting a public interface and releasing on crates.io for use in other Rust projects?
$ cat /etc/os-release
PRETTY_NAME="Raspbian GNU/Linux 10 (buster)"
NAME="Raspbian GNU/Linux"
VERSION_ID="10"
VERSION="10 (buster)"
VERSION_CODENAME=buster
ID=raspbian
ID_LIKE=debian
HOME_URL="http://www.raspbian.org/"
SUPPORT_URL="http://www.raspbian.org/RaspbianForums"
BUG_REPORT_URL="http://www.raspbian.org/RaspbianBugs"
$ cat main.hvm
// Creates a tree with `2^n` elements
(Gen 0) = (Leaf 1)
(Gen n) = (Node (Gen(- n 1)) (Gen(- n 1)))
// Adds all elements of a tree
(Sum (Leaf x)) = x
(Sum (Node a b)) = (+ (Sum a) (Sum b))
// Performs 2^n additions in parallel
(Main n) = (Sum (Gen n))
$ RUST_BACKTRACE=1 hvm r main 3
thread 'main' panicked at 'capacity overflow', library/alloc/src/raw_vec.rs:509:5
stack backtrace:
0: rust_begin_unwind
at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/panicking.rs:498:5
1: core::panicking::panic_fmt
at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/core/src/panicking.rs:107:14
2: core::panicking::panic
at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/core/src/panicking.rs:48:5
3: alloc::raw_vec::capacity_overflow
at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/alloc/src/raw_vec.rs:509:5
4: hvm::builder::eval_code
5: hvm::run_code
6: hvm::main
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
$ git rev-parse HEAD
ff9ed47b163d6d24fdc6f968d35b95b81deb0969
$ cargo build
Finished dev [unoptimized + debuginfo] target(s) in 0.08s
$ cd bench/Fibonacci
$ ../../target/debug/hvm c main.hvm
Compiled to 'main.c'.
$ clang -g -O2 main.c -o main -lpthread
$ ./main 10
main: main.c:1337: int main(int, char **): Assertion `mem.node' failed.
Aborted
$
The message is from:
https://github.com/Kindelia/HVM/blob/ff9ed47b163d6d24fdc6f968d35b95b81deb0969/src/runtime.c#L1229
But I think the change in 99e3286 is ultimately what's causing it.
Hi there,
this is a really interesting project. I was wondering why the target runtime is obtained transpiling to a C template, instead of directly using rust itself (eg. using Rayon); another option could be targeting something like LLVM.
Thank you
In HVM, an interesting way to represent numbers is to use λ-encoded digit-strings of varying bases, parametrized by a list of numbers (a base-list). For example, by using [2,3,5]
, we represent numbers base 2-3-5:
n = a % 2 + b * 2 % (2 * 3) + c * (2 * 3) % (2 * 3 * 5)
Moreover, since 2 * 3 * 5 = 30
, numbers larger than 30
on base 2-3-5
will wrap around. That means operations on them will be mod 30
, with no need for a modulus
operation. This may help implementing mod N
algorithms that fuse, by avoiding calls to mod
, which could interrupt fusion.
The base-list can be anything. For example, [100]
would give us operations mod 100
. But, in turn, each number would use a lot of space, since there are 100 digits. A better idea is to factorize 100
and use [2,2,5,5]
instead, which will also give us operations mod 100
, but using much less space. Of course, if we use a base-list with identical numbers, say, [2,2,2,2]
, we'll just have binary numbers, or N-ary, for whatever the N
is. Here is the code:
// Applies `f` `n` times to `x`, directly
(Repeat 0 f x) = x
(Repeat n f x) = (f (Repeat (- n 1) f x))
// Given a base-list, applies `f` `n` times to `x`,
// in such a way that is optimized for that base-list
(Apply Nil n f x) = x
(Apply (Cons b bs) n f x) = (Apply bs (/ n b) λk(Repeat b f k) (Repeat (% n b) f x))
// Given a base-list, applies `f` `n` times to `x`,
// in such a way that is optimized for that base-list
(Times Nil n f x) = x
(Times (Cons b bs) n f x) = ((TimesGo b b bs n) f x)
(TimesGo 0 b bs n) = (n λfλx(x))
(TimesGo i b bs n) = ((TimesGo (- i 1) b bs n) λpλfλx(Times bs p λk(Repeat b f k) (Repeat (- i 1) f x)))
// Given a base, ends a digit-string
(E base) = λend (EGo base end)
(EGo 0 end) = end
(EGo base end) = λctr (EGo (- base 1) end)
// Given a base, appends `digit` to a digit-string
(D base digit pred) = λend (DGo base digit pred λx(x))
(DGo 0 n pred ctr) = (ctr pred)
(DGo base 0 pred ctr) = λctr (DGo (- base 1) (- 0 1) pred ctr)
(DGo base n pred ctr) = λera (DGo (- base 1) (- n 1) pred ctr)
// Given a base-list, converts a digit-string to a list
(ToList Nil xs) = Nil
(ToList (Cons b bs) xs) = (ToListGo b bs xs)
(ToListGo 0 bs xs) = (xs Nil)
(ToListGo b bs xs) = ((ToListGo (- b 1) bs xs) λp(Cons (- b 1) (ToList bs p)))
// Given a base-list, converts a digit-string to a number
(ToU32 bases xs) = (ToU32Go bases (ToList bases xs) 1)
(ToU32Go Nil Nil m) = 0
(ToU32Go (Cons b bs) (Cons x xs) m) = (+ (* x m) (ToU32Go bs xs (* m b)))
// Given a base-list, returns the number 0
(Zero Nil ) = End
(Zero (Cons b bs)) = (D b 0 (Zero bs))
// Giben a base-list, and a u32 `i`, returns the number `n`
(Number bases i) = (Apply bases i λx(Inc bases x) (Zero bases))
// Given a base, applies a function to the predecessor
// (Inj [3] f λeλaλbλc(a pred)) == λeλaλbλc(a (f pred))
(Inj base f xs) = λen (InjGo base f (xs λf(en)))
(InjGo 0 f xs) = (xs f)
(InjGo b f xs) = λv (InjGo (- b 1) f (xs λpλf(v (f p))))
// Given a base-list, increments a digit-string
(Inc Nil xs) = Nil
(Inc (Cons b bs) xs) = λen λn0 (IncMake (- b 1) bs (xs en) λp(n0 (Inc bs p)))
(IncMake 0 bs xs ic) = (xs ic)
(IncMake n bs xs ic) = λv (IncMake (- n 1) bs (xs v) ic)
// Given a base-list, increments `b` a total of `a` times
// This is equivalent to addition, and is fast due to fusion
(Add bases a b) = (Times bases a λx(Inc bases x) b)
// Given a base-list, creates an adder for two digit-strings, with carry bits
(AdderCarry Nil xs) = λys(ys)
(AdderCarry (Cons b bs) xs) = (AdderCarryGo b b bs xs)
(AdderCarryGo 0 b bs xs) = (xs λys(ys))
(AdderCarryGo i b bs xs) = ((AdderCarryGo (- i 1) b bs xs) (λxs λys (Repeat (- i 1) λx(Inc (Cons b bs) x) (Inj b (AdderCarry bs xs) ys))))
// Given a base-list, adds two bit-strings, with carry bits
(AddCarry bases xs ys) = ((AdderCarry bases xs) ys)
// FIXME: this is wrong, only works if all bases are the same
(Mul bases xs ys) = (MulGo bases xs λk(Add bases ys k))
(MulGo Nil xs add) = End
(MulGo (Cons b bs) xs add) = (MulDo b b bs xs add)
(MulDo b 0 bs xs add) = (xs End)
(MulDo b i bs xs add) = ((MulDo b (- i 1) bs xs add) λp(Repeat (- i 1) add (D b 0 (MulGo bs p add))))
(Main x) =
let bases = (Cons 2 (Cons 3 (Cons 5 (Cons 7 Nil))))
let to_list = λx(ToList bases x)
let to_u32 = λx(ToU32 bases x)
let times = λnλfλx(Times bases n f x)
let apply = λnλfλx(Apply bases n f x)
let zero = (Zero bases)
let inc = λx(Inc bases x)
let add = λaλb(Add bases a b)
let addc = λaλb(AddCarry bases a b)
let mul_a = λaλb(Times bases a (add b) zero) // mul by repeated add by repeated inc
let mul_b = λaλb(Times bases a (addc b) zero) // mul by repeated add with carry
let mul_c = λaλb(Mul bases a b) // mul using the incorrect algorithm
let num = λi(Number bases i)
(to_u32 (add (num 123456789) (num 987654321)))
This snippet includes an example of computing 123456789 + 987654321 mod 210
, and it efficiently outputs 60
, no call to mod
is needed. Problems:
How do we implement mul
and exp
efficiently? The code above has mul
, but is wrong (see the FIXME).
Given a function f(x) = a^x mod N
, what is the optimal algorithm for finding the period?
Awesome project, got to it via HN
I thought I'd try to port microKanren to HVM (why not), but it requires matching on a lambda, or a slight change in the semantics
Here's what I came up with until now, it's likely I may have missed a few things or got them wrong
(Car (Cons x y)) = x
(Cdr (Cons x y)) = y
(Vareq (Var v1) (Var v2)) = (== v1 v2)
(ExtS x v s) = (Cons (Cons x v) s)
(EgInner sc NIL) = MZERO
(EgInner sc s) = (Unit (Cons s (Cdr sc)))
(Eg u v) =
λsc(EgInner sc (Unify u v (Car sc)))
(Unit sc) = (Cons sc MZERO)
MZERO = NIL
(UnifyInner u v NIL) = NIL
(UnifyInner (Var v) (Var v) s) = s
(UnifyInner (Var u) v s) = (ExtS (Var u) v s)
(UnifyInner u (Var v) s) = (ExtS (Var v) u s)
(UnifyInner (Cons hu tu) (Cons hv tv) s) =
(let s = (Unify hu hv s);
(Unify tu tv s))
(UnifyInner u u s) = s
(Unify u v s) =
(UnifyInner (Walk u s) (Walk v s) s)
(Walk u NIL) = u
(Walk (Var u) (Cons (Cons (Var u) v) s)) = (Walk v s)
(Walk (Var u) .) = (Var u)
(CallFresh f) = @sc(let c = (Cdr sc);
((f (Var c)) (Cons (Car sc) (+ c 1))))
(Disj g h) = @sc(Mplus (g sc) (h sc))
(Conj g h) = @sc(Bind (g sc) h)
(Mplus NIL y) = y
(Mplus (Cons h t) y) = (Cons h (Mplus t) y)
(Mplus fn x) = @(Mplus x (fn))
(Bind NIL g) = MZERO
(Bind (Cons h t) g) = (MPlus (g h) (Bind t g))
(Bind fn g) = @(Bind (fn) g)
EMPTY_STATE = (Cons NIL 0)
// uK ends here
(Zzz g) = @sc(@(g sc))
(ConjL (Cons g NIL)) = (Zzz g)
(ConjL (Cons g gs)) = (Conj (Zzz g) (ConjL gs))
(DisjL (Cons g NIL)) = (Zzz g)
(DisjL (Cons g gs)) = (Disj (Zzz g) (DisjL gs))
(Conde (Cons l alts)) = (DisjL (Cons (ConjL l) (Conde alts)))
(Fresh NIL goals) = (ConjL goals)
(Fresh (Cons v lvars) goals) = @v(Fresh lvars goals)
(Pull NIL) = NIL
(Pull (Cons x y)) = (Cons x y)
(Pull f) = (Pull (f))
(Take 0 f) = NIL
(Take n NIL) = NIL
(Take n (Cons x y)) = (Cons x (Take (- n 1) y))
(Take n f) = (Take n (Pull f))
//
(Fives x) = (Disj (Eg x 5) (Fives x))
(Main) = (let f = (CallFresh Fives);
(f EMPTY_STATE))
I was playing around with lambda encoded bitstrings and came across this weird behavior. Take the following functions;
Empty = λe λo λi e
(Zero s) = λe λo λi (o s)
(One s) = λe λo λi (i s)
(Inc b) =
λe λo λi
(b (o Empty)
i
(λx (o (Inc x))))
If we set Main = (One Empty)
we get the
λ_ λ_ λx1 (x1 λx2 λ_ λ_ x2)
If we set Main = (Inc (Inc Empty))
we get
λ_ λx1 λx2 (x2 λx3 λ_ λ_ x3)
Which, while being printed a bit differently, is clearly the same term.
If we set Main = (Inc (One Empty))
we get
λ_ λx1 λ_ (x1 λ_ λx2 λ_ (x2 λx3 λ_ λ_ x3))
which is what I expect. If we set Main = (Inc (Inc (Inc Empty)))
we should get something alpha-equivalent, but instead we get a stack overflow.
The same technique also applies for Inc. We start with the usual definition:
(Inc E) = E
(Inc (O x)) = (I x)
(Inc (I x)) = (O (Inc x))
I don't understand this. It looks a lot like you're defining Inc 0 to be 0 in the first line. But surely if you increment 0 you get 1, right? Shouldn't it be
(Inc E) = (O E)
If so it looks a lot like defining numbers in a bijective base-2 scheme. While that is absolutely great and makes perfect sense to me, is it the usual definition? I thought that in lambda calculus they commonly define numbers with unary, i.e. just repeated application.
Currently, the C allocator just reserves 8 GB of memory when the program starts, and, if there are multiple threads, that memory is split between then. So, for example, if there are 8 threads, each one gets 1 GB to work with. A worker can read from other worker's memories (which may happen for shared data, i.e., when they pass through a DP0/DP1 pointer), but only the owner of the slice can allocate memory from it. Each worker also keeps its own freelist.
As such, a huge improvement would be thread-safe, global, possibly arena-based, alloc and free.
Right now, the task scheduler will just partition threads evenly among the normal form of the result. So, for example, if the result is (Pair A B)
, and there are 8 threads, 4 will be assigned to reduce A and B. This is obviously very limited: if A reduces quickly, it won't re-assign the threads to help reducing B, so the program will just use half of the cores from that point. And so on. This is why algorithms that return lists are often slower, they aren't using threads at all. In the worst case, it will just fallback to being single threaded.
A huge improvement would be a proper scheduler, that adds potential redexes to a task pool. When I attempted to do that, the cost of synchronization added too much overhead, ruining the performance. Perhaps a heuristic to consider would be to limit the depth of the redexes for which global tasks are emitted; anything below, say, depth=8, would just be reduced by the thread that reaches it. Many improvements can be done, though.
Right now, HVM only supports U32. The numeric types above should be supported too. I32 and F32 should be easy to add, since they are unboxed, like U32. I64, U64 and F64 require implementing boxed numbers, since they don't fit inside a 64-bit Lnk (due to the 4-bit pointer), but shouldn't be hard. Discussion on whether we should have unboxed 60-bit variants is valid.
On #81.
On #54.
Using HVM as a pure Rust lib inside other projects should be very easy, specially considering the interpreter has a fairly good performance (only about 50% slower than the single-thread compiled version). We must think of the right API to expose these features in a Rust-idiomatic, future-proof way. Input from the Rust community is appreciated.
Right now, HVM compiles to C. That means a C compiler like clang or gcc is necessary to produce executables, which means that it isn't portable when used as a lib. Of course, libs can just use the interpreter, but it is ~3x slower than the compiler, and is not parallel. Ideally, we'd instead JIT-compile HVM files using WASM or perhaps Cranelift, allowing lib users to enjoy the full speed of HVM in a portable way.
Adding IO should be easy: just write an alternative version of normalize
that, instead of just outputting the normal form, pattern-matches it against the expected IO type, and performs IO operations as demanded.
The compiler doesn't support Windows yet. The use of -lpthreads
may be an issue. The interpreter should work fine, though.
On #52.
Compiling to the GPU would be amazing, and I see no reason this shouldn't be possible. The only complications I can think of are:
How do we alloc inside a GPU work unit?
Reduce() is highly branching, so this may destroy the performance. But it can be altered to group tasks by categories, and merge all branches in one go. The reduce function essentially does this: branch -> match -> alloc -> subst -> free
. The last 4 steps are very similar independent of the branch it falls into. So, instead, the branch part can just prepare some data, and the match -> alloc -> subst -> free
pass are all moved to the same branch, just with different indices and parameters. This will avoid thread divergence.
GPGPU is a laborious mess. I'm under OSX, so I can't use CUDA; OpenCL is terrible; WebGPU is too new (and from what I could see, doesn't have the ability to read/write from the same buffer in the same pass!?). Regardless, we'd probably need different versions, Metal for OSX, CUDA and OpenCL for Windows depending on the GPU, etc. So, that's a lot of work that I can't do myself, it would require expensive engineers.
HVM is meant to be a "low-level target", which is kinda confusing because it is actually very high-level, but in the sense that users shouldn't be writing it directly. Ideally, Kind will compile to HVM, but it needs a lot of adjustments before that is possible.
See this issue for more information, as well as the superposed duplication section on HOW.md.
On #61.
This project has basically no test. Write tests!
Will edit the thread as I think of more things. Feel free to add to this thread.
https://github.com/Kindelia/HVM/blob/master/HOW.md#hvms-core-language-overview
term ::=
| λvar. term # a lambda
| (term term) # an application
| (ctr term term ...) # a constructor
| num # a machine int
| (op2 term term) # an operator
| let var = term; term # a local definition
rule ::=
| term = term
file ::= list<rule>
The grammar in the link above exacerbates confusion about the issue in the link below:
Also, regardless of this syntax issue, isn't the grammar still technically incorrect because for ctr applications you need a rule
, not just a term
? Maybe like one of these depending on if you adopt []
?
| (term term) # a lambda application
| (Rule term) # a rule application
or
| (term term) # a lambda application
| [rule term] # a rule application
The following program:
(Sum Nil ) = 0
(Sum (Cons x xs)) = (+ x (Sum xs))
(Mul Nil ) = 1
(Mul (Cons x xs)) = (* x (Mul xs))
(Main x) =
let xs = (Cons 1 (Cons 2 (Cons 3 Nil)))
(Pair (Sum xs) (Mul xs))
Sometimes gives the wrong result when evaluated in parallel, due to synchronization of dup nodes. As I mention on HOW.md, the fact values exist only at one place greatly simplifies parallelism, and most rewrite rules are thread-safe by design, since the thread has full ownership of every memory location it reads/writes to. There is an exception to that rule, though: dup nodes. For example, consider the program below:
dup a b = (Cons 1 (Cons 2 (Cons 3 Nil))))
...
It can happen that two different threads arrive at the same dup node (which is impossible for all other nodes): one coming from the a
binder, and other coming from the b
binder. This will case both threads to attempt to apply the DUP-CTR
rule at the same time, which may result in errors. To avoid this issue, we must make sure that, whenever a thread accesses a dup node from one of its binders (either a
or b
here), it locks the area, preventing outside threads from accessing it. This was (naively) prevented by using atomic_inc
as a lock here, but this is a temporary solution, which was expected to break in some cases, as depicted above. A more robust solution must be carefully designed to ensure that two threads won't access the same dup node at the same time; i.e., once a thread accesses a dup node, it must lock any subsequent access to it, and a thread that attempts to do so must spinlock until the other is done.
Some notes I took while reading your introduction to your exciting project from my perspective as someone familiar with performant GHC-Haskell code. Of interest to me is how exactly everything is mapped to hardware, at least to the point of heap object layout and memory reads/writes. Not everybody might need that much detail, but perf seems like the main selling point of overhauling the machine model of LC languages, so that's what I expect to take away from reading. Later... Looking forward to more details on that in the dedicated section. :)
dup
and reference counted thunksOnce I read (at the end of this section)
Collection is orchestrated by variables that go out of scope.
I immediately thought "Ah, so dup
nodes are like reference counted thunks". Is that true? Why isn't it true?
I presume it has something to do with reduction of open expressions (under lambdas). Later... Indeed, superposition means I need to remember which "reference count" referenced which entry in the superposition. So there is more structure. I wish that was mentioned earlier, so that I knew where to look for the difference.
I find the example at the end of https://github.com/Kindelia/HVM/blob/master/HOW.md#lambda-duplication a bit unconvincing, because no optimising compiler would leave behind an obvious redex like (λx. ...) 2
. GHC would start with
dup f g = (λy (Pair 4 y)
(Pair (f 10) (g 20))
to begin with. I know that there are probably better examples with open expressions (otherwise, what's the point?), perhaps one with an expression under a non-redex lambda that can't be floated out because it mentions the lambda-bound variable or some such (no idea, really). Giving one of those would be much more informative!
In other words, I think this section should mention the "killer example".
What follows is more like a few loose connections I can make as someone familiar with the way GHC-compiled programs work.
The takeaway of this section for me seems to be that we want to eta-expand all definitions as much as possible. In GHC that is generally true, too, but problematic if doing so destroys sharing (for example, when expanding over a shared let binding, effectively floating it inside a lambda). Are there similar cases in which sharing is destroyed, due to eagerly dup
ped lambdas?
For example, a clever way to implement a Data.List would be to let all algorithms operate on λ-encoded Church Lists under the hoods, converting as needed. This has the same "deforestation" effect of Haskell's rewrite pragmas, without any hard-coded compile-time rewriting, and in a more flexible way.
In fact, I've argued that GHC's rewrite RULEs from [a]
to fusion helpers should be made more explicit by introducing a data type
newtype FB a = FB { unFB :: forall b. (a -> b -> b) -> b -> b }
mapFB :: (a -> b) -> FB a -> FB b
...
rather than having the helpers scattered about like this. Indeed, if you only used FB
everywhere we use lists today, we wouldn't need any rewrite rules at all.
For example, using map in a loop is "deforested" in HVM. GHC can't do that, because the number of applications is not known statically.
Indeed, mapFB f . mapFB g
is optimised to mapFB (f . g)
, too by GHC. I presume that you mean that Interaction Nets support these optimisations dynamically, for example if mapFB g
is passed as some function parameter x
and we have mapFB f . x
in the body. That indeed seems like it has an edge over what GHC does!
Note that too much cloning will often make your normal forms large, so avoid these by keeping your programs linear.
Yes, this seems like the loss of sharing due to excessive eta-expansion I talk about above. Note also that transforming the arity 2 function into the arity 1 function is unsound, as it makes Add
stricter: Add undefined `seq` ()
diverges after the transformation. So the programmer has to write the arity 1 definition from the beginning.
Judging from what reduce does, it seems to me that the implementation is more like a bytecode interpreter because of the lack of a compile-time mapping from Lambdas to actual machine code. Do you have ideas for how such a mapping might work? Similar to how "graph reduction" transitioned to today's "supercombinator" encodings (which is just the familiar lambda=<static link, code ptr> encoding again).
I saw the implementation of u64 normal_join(u64 tid) {...}
in runtime.c
and its use and wonder whether the HVM language (or is it HOVM language?) guarantees no deadlocks/livelocks in this case?
If so, how? If not, then this should be taken into consideration if I understood the inner workings correctly.
I'm unable to do any mathematical proofs of this, so I've turned to asking about this 😉.
It would be nice to have some form of debugger for HVM - with ability to stop at breakpoints to see what expressions are the names bound to at the moment.
I appreciate the work and thought you put into this project! It got me quite excited, to be honest; now I really want to know if/how interaction nets can actually be mapped to stock hardware efficiently.
But it appears to me that the benchmarks comparing to GHC-compiled Haskell aren't entirely faithful. You say
Since lists are sequential, and since there are no higher-order lambdas, HVM doesn't have any technical advantage over GHC.
That's technically true, but it's also not the kind of code you'd write in Haskell. For one, the tail-recursive nature of range
means that you really have to build up a list that has to be copied during GCs. But Haskell (and HVM) is a lazy language and I'd rather write a generative definition of range
, like
range 0 = Nil
range n = Cons n (range (n-1))
And with this definition, the entire program operates in constant space (modulo GC) due to the perks of laziness. Here's the complete program:
import System.Environment
import Data.Word
data List a = Nil | Cons a (List a)
-- Folds over a list
fold Nil c n = n
fold (Cons x xs) c n = c x (fold xs c n)
-- A list from 0 to n
range 0 = Nil
range n =
let m = n - 1
in Cons n (range m)
-- Sums a big list with fold
main = do
n <- read.head <$> getArgs :: IO Word32
let size = 1000000 * n
let list = range size
print $ fold list (+) 0
This program takes about 1s for n=30
, the same as the original program takes for n=10
for me (and copies 1.6GB during GC). So about a factor 3 speedup.
Interaction nets might enjoy similar speedups, but I don't know that for sure. So it seems like a bit of an apples to oranges comparison.
Ultimately, making use of foldl'
and list fusion on [0..size-1::Int]
(Word32
is broken) means another factor 100 of a speedup, of which I similarly don't know how it carries over to interaction nets.
I think we could get a lot of mileage out of HVM already by exposing stdio through a minimalist line-oriented IO primitive that might look like this.
(IO Input (State s) (Line c s)) = (IO Output (YourProgram (State s) (Line c s)))
(YourProgram state inputline) = ((State ...) (Line ...))
Here I write out (Line c s)
to show it is a cons list of characters (atoms). (State s)
is I guess a state burrito.
A line is a zero-terminated byte list representing an ascii string.
Then we can pipe into stdin and out of stdout and have some useful programs.
I wonder what's the motivation behind rewriting the VM in Rust? Don't you lose one of the major selling points (verifiable VM)?
There's no reason you'd have to write a full scheduler yourself.
Crossbeam can do at least some of the work for you.
It has a work stealing deque and message passing essentials.
Probably it would need a better error handling. The ideal would be something like Rust errors messaging ( or better ).
In my view there is two approachs for a LSP. Creating something on top of the exposed api for handling error. Or creating a checker from zero. The later is not great, because this means more code to maintain.
Right now the generated C code can't be compiled on Windows due to the use of <pthreads.h>
etc.
One way it could be done would be to compile it using a compatibility layer like Cygwin, but I never did this before.
Help from some Windows programmer would be appreciated. :3
I'd like to confirm the theoretical background of HOVM.
I've read;
#14 (comment) @leonardohn
But a target is in fact a machine. Take for instance the LLVM compilation targets like X86, ARM, SPIR-V or even WASM; They all resemble computer architectures that are either real hardware (as in X86 and ARM) or a virtual hardware (like SPIR-V or WASM). I agree that historically virtual machines are usually related to traditional hardware (sharing characteristics like using von Neumann-like architectures), but in this particular case, we are talking about an abstraction that doesn't have memory at all, just like original lambda calculus when compared to Turing machines. So, it doesn't even make sense talking about garbage collection inside this machine, unless when considering the implementation of this abstraction over bare metal hardware, which in turn also don't require garbage collection.
This is extremely interesting, especially;
in this particular case, we are talking about an abstraction that doesn't have memory at all, just like original lambda calculus when compared to Turing machines.
My assumption right now is this VM is a successor of
parallel_lambda_computer_tests
optlam -> Absal -> Symmetric-Interaction-Calculus
The Symmetric Interaction Calculus
->
EA-NET +
Elementary Affine Core (EA-CORE) (Specification)
FM-NET +
FM-CORE (Formality-Core)
Formality-Core/Whitepaper.md
Soonad/Whitepaper
->
HOVM
Am I correct?
Please let me know.
Thanks.
Python script to run blackbox tests on HVM programs.
e.g.
https://github.com/quleuber/HVM/tree/tests/test
Here's a lambda term I have used in the past as a benchmark for my own lambda calculus interpreter. My interpreter runs it in 3.5s, GHC runs it in 2.5s, and neither goes past the minimum heap size of a few megabytes.
When I run it in HVM, it gradually allocates more and more memory, finishing up with around 8.5 GB and then segfaulting.
Steps to reproduce:
$ cat > bench.hvm <<EOF
// Computes 0 by subtracting a number from itself
Main = ((λbig λsub (sub big big))
// the big number, 43'046'721
((λs λz (s(s(s(s z))))) (λs λz (s(s z))) (λs λz (s(s(s z)))))
// the subtraction algorithm
(λn λm λs λz
(n (λy λk (k (s (y (λa λb a))) y))
(λk (k z (λ_ z)))
(m (λk λa λb (b k)) (λa λb a)))))
EOF
$ cargo run -- c bench.hvm
Finished dev [unoptimized + debuginfo] target(s) in 0.01s
Running `target/debug/hvm c bench.hvm`
Compiled to 'bench.c'.
$ gcc -o test -pthread -O2 bench.c
$ ./test
Segmentation fault (core dumped)
First of all, thanks for this very interesting project!
I tried some of the examples, and the last one about lambda arithmetic:
// Increments a Bits by 1
(Inc xs) = λex λox λix
let e = ex
let o = ix
let i = λp (ox (Inc p))
(xs e o i)
// Adds two Bits
(Add xs ys) = (App xs λx(Inc x) ys)
// Multiplies two Bits
(Mul xs ys) =
let e = End
let o = λp (B0 (Mul p ys))
let i = λp (Add ys (B0 (Mul p ys)))
(xs e o i)
// Squares (n * 100k)
(Main n) =
let a = (FromU32 32 (* 100000 n))
let b = (FromU32 32 (* 100000 n))
(ToU32 (Mul a b))
just gives me a segfault:
$ hvm c main && clang -O2 main.c -o main -lpthread && ./main 25
Compiled to 'main.c'.
Segmentation fault (core dumped)
Tested with commit 1da2875.
Furthermore, the QuickSort example just outputs:
(Sum (QSort (Pivot) (Pivot) (Randoms 1 2500000)))
Probably because neither Randoms, Sum, nor Pivot are defined ...
Now back to studying HOW.md ... ;)
If a lambda that clones its argument is itself cloned, then its clones aren't allowed to clone each-other.
See #44 for more information, as well as the superposed duplication section on HOW.md.
Example of what is not allowed:
let f = λs λz (s (s z)) # a lambda that clones its argument
let g = f # a clone of that lambda
(g f) # the clone will clone the original
These aren't problems with this implementation. They're just ideas that I wanted to dump down somewhere. Feel free to close this after reading it.
e.g.
hvm compile -M 2G main.hvm
To quote from the README as of b95388f:
This chart isn't wrong: HVM is exponentially faster for function composition, due to optimality, depending on the target function. There is no parallelism involved here. In general, if the composition of a function
f
has a constant-size normal form, thenf^(2^N)(x)
is constant-time (O(N)
) on HVM, and exponential-time (O(2^N)
) on GHC. This can be taken advantage of to design novel functional algorithms. I highly encourage you to try composing different functions and watching how their complexity behaves. Can you tell if it will be linear or exponential? Or how recursion will affect it? That's a very insightful experience!
The bolded section of text (constant-time (O(N)
)) doesn't make sense - I think it's meant to say linear time.
Just to be sure, all works fine there?
Hmm, I'm actually not sure. I tried one of the benchmarks which ended up compiling ok, with only the following warning:
bash-5.1$ clang -O2 compose_id.hovm.out.c -o main -lpthread
compose_id.hovm.out.c:974:23: warning: incompatible pointer to integer conversion assigning to 'Thd' (aka 'unsigned long') from 'void *'
[-Wint-conversion]
workers[t].thread = NULL;
^ ~~~~
1 warning generated.
But when I try to run the resulting binary, it doesn't seem to finish running, instead seemingly getting stuck:
bash-5.1$ ./main
Reducing.
It hasn't printed anything after that new line for > than 30 minutes (I ended up killing it). Not sure if it's actually not working properly or if it's just my slow CPU though. When I looked at my process manager, the process didn't seem to be using any CPU cycles either.
What is your CPU
$ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 40 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 4
On-line CPU(s) list: 0-3
Vendor ID: AuthenticAMD
Model name: HP Hexa-Core 2.0GHz
CPU family: 22
Model: 48
Thread(s) per core: 1
Core(s) per socket: 4
Socket(s): 1
Stepping: 1
Frequency boost: enabled
CPU max MHz: 2000.0000
CPU min MHz: 1000.0000
BogoMIPS: 3992.48
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_op
t pdpe1gb rdtscp lm constant_tsc rep_good acc_power nopl nonstop_tsc cpuid extd_apicid aperfmperf pni pclmulqdq monitor ssse3
cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefet
ch osvw ibs skinit wdt topoext perfctr_nb bpext ptsc perfctr_llc cpb hw_pstate ssbd vmmcall bmi1 xsaveopt arat npt lbrv svm_lo
ck nrip_save tsc_scale flushbyasid decodeassists pausefilter pfthreshold overflow_recov
Virtualization features:
Virtualization: AMD-V
Caches (sum of all):
L1d: 128 KiB (4 instances)
L1i: 128 KiB (4 instances)
L2: 2 MiB (1 instance)
NUMA:
NUMA node(s): 1
NUMA node0 CPU(s): 0-3
Vulnerabilities:
Itlb multihit: Not affected
L1tf: Not affected
Mds: Not affected
Meltdown: Not affected
Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Spectre v2: Mitigation; Full AMD retpoline, STIBP disabled, RSB filling
Srbds: Not affected
Tsx async abort: Not affected
Originally posted by @nothingnesses in #15 (comment)
Right now, HVM only supports U32. The numeric types above should be supported too. I32 and F32 should be easy to add, since they are unboxed, like U32. I64, U64 and F64 require implementing boxed numbers, since they don't fit inside a 64-bit Lnk (due to the 4-bit pointer), but shouldn't be hard. Discussion on whether we should have unboxed 60-bit variants is valid.
I'd be interested in adding support for at least the unboxed numeric types as this seems like a low hanging fruit. I just wanted to check what the concensus was between implicit/explicit casting and how operators should function in regards to mixed input types. My instinct would just be to cast up to the 'bigger' type but maybe operators should only operate on the same type - combined with explicit casts.
@VictorTaelin what do you think?
Originally posted by @samtoth in #38 (comment)
I32
F32
U64
, I64
F64
HVM produces incorrect results on some inputs, one of which being:
(Main arg) = ((λa (a (λb ((λc (λd (b (c d)))) a)))) (λe (e e)))
On this input, HVM produces the result λx1 λx2 ((x1 x2) (x1 x2))
while the correct result is λx1 λx2 ((x1 x1) (x2 x2))
(as verified by https://crypto.stanford.edu/~blynn/lambda/, with the corresponding input syntax by adding dots after the variable of each lambda: ((λa. (a (λb. ((λc. (λd. (b (c d)))) a)))) (λe. (e e)))
).
Besides, on other inputs such as:
((λa λb ((b a) λc λd (λe (d a) a)) λf (f f)) λz z)
HVM produces the result λx1 (x1 λx2 (x2 _))
which is not even well-formed.
https://github.com/Kindelia/HVM/blob/14a02de71bd244247abc12749bbc37461feee33d/HOW.md?plain=1#L572
When introducing Dup-Sup, at one point you use the old notation for superposition (<r s>
), but everywhere else the other style is used. This should read dup a b = {r s}
if I'm not mistaken.
(App f x) = (f x)
(Succ x) = (+ x 1)
(Main) = (Pair (App Succ 0) (App @x(Succ x) 0))
This HVM code reduces (Main)
to (Pair ((Succ) 0) 1)
. I would intuitively expect (Pair 1 1)
.
My understanding for this is that in (App Succ 0)
, the expression Succ
is actually shorthand for (Succ)
, so we have (App (Succ) 0)
which reduces to ((Succ) 0)
which does not reduce.
Would this be considered an issue, or just a gotcha for newcomers?
As of now, compiling the generated code with clang on Linux or with the -pedantic
flag gives some warnings, mostly related to constants, e.g.
./main.c:341:10: warning: expression is not an integer constant expression; folding it to a constant is a GNU extension [-Wgnu-folding-constant]
case DP0: {
^~~
Also, compiling with tinycc fails, as per the quoted comment bellow.
Fixing this we can move forward to try embedding a C compiler (clang? tcc? zig cc?).
Please feel free to comment other platforms/compilers that you think should be considered.
I think that I can program anything using just type constructors (and pattern matching / destructuring) and not use lambdas or lambda application at all.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.