I appreciate the work and thought you put into this project! It got me quite excited,

"List Fold (Sequential)" Haskell benchmark isn't entirely faithful,about kindelia/hvm

Comments (7)

VictorTaelin commented on August 16, 2024 4

I disagree this isn't the kind of code you write in Haskell, in general. Yes, for the problem folding over a list, specifically, it is possible to optimize it to a constant-space loop... in some cases. But that doesn't apply to basically any other Haskell function, which rely a lot on deep recursion. Even fold itself sometimes can't be turned into a constant-space loop. For example, if it was a subtraction, we couldn't use foldl, because - isn't associative.

That said, purpose of this benchmark was to stress-test how efficient each engine is at recursing, pattern-matching and allocating. If you replace that by a function that does no recursion, then it wouldn't be benchmarking how well the engine recurses, but instead, how well it deals with tight loops. I've tested, and HVM performs about ~3x slower than GHC in this case, which I guess is expected, because HVM does nothing to optimize these tight loops... yet! But it absolutely could, in which case both obviously would perform the same, since they'd generate the same low-level code.

Remember, HVM is not optimized! It is the first version of a prototype, compared to a mature state-of-art compiler. It should be much slower than GHC, since devs put decades of effort into optimizing it. For example, GHC has an arena-based allocator, which makes great use of cache. HVM doesn't. Yet, it does perform as well as GHC on recursive functions. That's actually surprising. Think of how well it would perform if it had a fraction of GHC's total men-hours dedicated into optimizing it? That's the message I'm trying to pass.

from hvm.

sgraf812 commented on August 16, 2024 3

What I'm saying is: HVM is not an optimizer, it is just the runtime. So, if you want a fair comparison, it is STG vs HVM, not GHC vs HVM. A (kinda) reasonable way to do it is to compile Haskell with GHC, and then run the resulting program on both GHC and HVM. Are we on the same page here?

Yes, I agree with that. I'd even be fine with looking at the STG code, identifying some code patterns that you wouldn't generate if HVM was the target and replace them with more perf-aware patterns, playing the role of an HVM-backend of GHC.

With that in mind, your comparison isn't entirely faithful again, because you are using Word32 instead of Int, which, as I said in the OP, has a broken Enum instance that doesn't allow fusion:

Ultimately, making use of foldl' and list fusion on [0..size-1::Int] (Word32 is broken) means another factor 100 of a speedup

For Word32 and Int32 you get STG that is very different to the Go function I gave.

Fortunately, the issue is already fixed on GHC master (and I believe will also be fixed in GHC 9.0.3 and 9.2.2). So there are multiple ways in which you can arrive at good code:

Use GHC master
Replace Word32 with Word (which seems to be even faster than Int)
Replace Word32 with Int

So, for example

import Data.List
import Data.Word

-- Sums a big list with fold
main = do
  let size = 100000000 :: Word
  let list = [0..size-1]
  print $ foldl' (+) 0 list

executes in 50ms on my machine.

Heck, we can smiply skip the whole list fusion shenigans and just benchmark a counting loop in Haskell, too. I'll simply translate your Go function back to Haskell:

import Data.Word

go :: Word32 -> Word32 -> Word32 -> Word32
go size i s = aux (i == size) size i s
  where
    aux True  size i s = s
    aux False size i s = go size (i + 1) (s + i)

main = do
  print $ go 100000000 0 0

Which takes 72ms on 8.10.7 (which has pretty bad sized Word support because of recently added sized primops) and 26ms on GHC master (which is the same perf as with Word).

If I want to evaluate this 8 times, like

import Data.Word

go :: Word32 -> Word32 -> Word32 -> Word32
go size i s = aux (i == size) size i s
  where
    aux True  size i s = s
    aux False size i s = go size (i + 1) (s + i)

main = do
  print $ go 100000000 0 0
  print $ go 100000000 0 1
  print $ go 100000000 0 2
  print $ go 100000000 0 3
  print $ go 100000000 0 4
  print $ go 100000000 0 5
  print $ go 100000000 0 6
  print $ go 100000000 0 7

it takes 580ms on 8.10.7 and 200ms on GHC master. That's way faster still than your parallelised program. Now you say

I wouldn't even care if it was 10x slower in single-thread, no-allocations, non-optimal scenarios, like that one. HVM is cool because it allows automatic parallelism, no garbage-collection and beta-optimality.

If you want to focus on auto parallelisation, you might like https://wiki.haskell.org/GHC/Data_Parallel_Haskell where they tried something very similar. Meanwhile, all the DPH stuff had been removed from GHC again, because nobody was there who used it or maintained it. As always, the cases where you can't make use of your cores through (superior, because requires less sync) data/task paralellism are indeed rare and at some point the synchronisation overhead dominates. I wouldn't expect HVM to reach 200ms on this benchmark even with 80 cores, simply because of contention issues. So "50x slower" indeed matters.

I don't buy the No GC claim, because effectively the dup nodes take the role of reference counters (generalised, of course!) and that is a form of GC.

Beta optimality is indeed a nice and flat win. But I wonder on what workloads it really matters compared to STG-compiled programs that GHC produces (e.g., making use of transformations such as FloatOut/full laziness, simplification, eta expansion, ...). And obviously, I wonder what the costs are that pay for the win. It seems there are quite severe losses for sequential code.
This is actually the question that matters most to me: Is Beta optimality worth the losses? Will the losses get significantly smaller over time?

That said, I'm not sure why you find this benchmark useful at all. It does no allocations, but allocations and garbage collection play a major role on functional runtimes. I don't know of any Haskell program that doesn't do a lot of allocation, so, this example isn't representative of real-world usage.

That allocations play a major role in functional programs is true; but we have to compare apples to apples somehow, right?

And it seems we just established that a simple counting loop will regress by a factor of ~50 when switching to a hypothetical HVM backend for GHC (as implemented at the moment! Not saying that won't change. But I still wonder how). This is the kind of hard and honest data I'm interested in! At the same time, there are cases and programs where GHC performs notoriously bad because of churn on the GC during build up of huge data structures. In these cases, HVM seems to be nearly on par. Your main README makes it sound like that is the common case for functional programs. I claim it is not; at some point the data structure is built up and ends up in Gen 1, where we will seldomly trace it during Gen 0 GC.

Besides, Haskell often has "fast non-allocating hot loops" such as repeated (Int)Map.lookup over the same map. And I think due to the dupping that must be involved (e.g., for N lookups over the same map, the root would have to be dupped N times, correct?), that'd suffer quite badly when compiled to HVM.

Indeed, all benchmarks you give build up a data structure and then consume it exactly once. Maybe compare examples where the same list or tree is traversed N times and benchmark that.

I stand by my initial claim that the README suggests a false balance, namely that HVM

is almost on par with GHC's (STG to native code) backend for sequential workloads
faster when autoparallelisation is on
and even faster when it manages to make an asymptotic difference due to beta optimality

(2) doesn't excite me very much TBH and we can have it (had it) in Haskell, too (others might be excited by it), (3) is nice to have, just as list fusion, and I think (1) is simply not true (at the moment!), as the counting loop example demonstrates.

from hvm.

VictorTaelin commented on August 16, 2024 1

I'm not sure if we're talking about the same thing, but seems like we are. What I'm saying is: HVM is not an optimizer, it is just the runtime. So, if you want a fair comparison, it is STG vs HVM, not GHC vs HVM. A (kinda) reasonable way to do it is to compile Haskell with GHC, and then run the resulting program on both GHC and HVM. Are we on the same page here? If so, you said that the following Haskell program:

import Data.List
import Data.Word

-- Sums a big list with fold
main = do
  let size = 100000000
  let list = [0..size-1] :: [Word32]
  print $ foldl' (+) 0 list

Is compiled to the equivalent of:

(Go size i s) = (Go.aux (== i size) size i s)
  (Go.aux 1 size i s) = s
  (Go.aux 0 size i s) = (Go size (+ i 1) (+ s i))

(Main n) = (Go 100000000 0 0)

Correct? So that'd be a fair benchmark. On my machine, this takes 1.631s on HVM (and a total of just 72 bytes allocated), and 0.934 on GHC. So, yes, on this case, GHC is 1.74x faster, which isn't surprising given it is, obviously, considerably more optimized, right? So, if these are the numbers you're interested on, there they are.

That said, I'm not sure why you find this benchmark useful at all. It does no allocations, but allocations and garbage collection play a major role on functional runtimes. I don't know of any Haskell program that doesn't do a lot of allocation, so, this example isn't representative of real-world usage. foldr is more representative precisely because it involves allocations and garbage collection, which aren't always avoidable. If you argue that foldr is unfair to Haskell just because GHC spends 88% of the time collecting garbage, then this just reinforces the importance of not needing a garbage collector!

Even if HVM somehow could not be optimized to be on par with GHC on this example, which is a very unrealistic assumption, keep in mind the main point of HVM isn't to beat GHC in single-core, but to do so when either parallelism or optimality can be exploited. For example, if the algorithm above is repeated 8 times:

import Data.List
import Data.Word

-- Sums a big list with fold
main = do
  let size = 100000000
  let list = [0..size-1] :: [Word32]
  print $ (
      foldl' (+) 0 list,
      foldl' (+) 1 list,
      foldl' (+) 2 list,
      foldl' (+) 3 list,
      foldl' (+) 4 list,
      foldl' (+) 5 list,
      foldl' (+) 6 list,
      foldl' (+) 7 list
    )

(Go size i s) = (Go.aux (== i size) size i s)
  (Go.aux 1 size i s) = s
  (Go.aux 0 size i s) = (Go size (+ i 1) (+ s i))

(Main n) = (Tuple
  (Go 100000000 0 0)
  (Go 100000000 0 1)
  (Go 100000000 0 2)
  (Go 100000000 0 3)
  (Go 100000000 0 4)
  (Go 100000000 0 5)
  (Go 100000000 0 6)
  (Go 100000000 0 7)
)

Then it takes just 2.359s on HVM, versus 7.656s on GHC, making HVM 3.24x faster.

This isn't even the point though, and I am not selling HVM as faster than GHC in general (even though many times it is). That is written right on the README. I wouldn't even care if it was 10x slower in single-thread, no-allocations, non-optimal scenarios, like that one. HVM is cool because it allows automatic parallelism, no garbage-collection and beta-optimality. These things alone make it a very powerful tool to the functional programming community, and I'm not sure why you seem upset with it?

from hvm.

sgraf812 commented on August 16, 2024

Think of how well it would perform if it had a fraction of GHC's total men-hours dedicated into optimizing it?

Well, that is actually what I'm trying to find out. TBH, I could speed up the program by another factor of 10 if I made use of list fusion, entirely eliminating all sources of allocation. I wonder if these factors carry over to HVM, too, if someone put in the work to implement a few foldr/build fusion primitives (which shouldn't be too hard). It's hard to tell, because as you say it obviously hasn't seen as much careful work put into it.

But at the same time, you can write programs in basically any language and instantly be in the realm of "a factor of 30 within the perf of C/GHC/whatever". The question is if you can ultimately deliver on the remaining constant factors. Pretending like those constant factors don't exist and say "we can do the same later on" is quite hand-wavy. Plus, speedups in GHC Haskell that are simply due to less GC pressure can't possibly carry over to HVM, because it doesn't do any GC (although it does a variant of reference counting, so that might count as GC after all). So it's a bit unproductive to do the comparison with a program that has a huge and continually growing memory residency, which is kind of a worst case for non-incremental, generational GC and still claim that the program is a workload that GHC should excel at.

I would much rather see "This is the most efficient way in GHC Haskell to solve this problem, and this is the most efficient way to solve it as an HVM program". And if GHC Haskell is 10 times faster in some cases, so be it! You can continually chip off the delta as you improve your VM implementation.

from hvm.

VictorTaelin commented on August 16, 2024

Well, that is actually what I'm trying to find out. TBH, I could speed up the program by another factor of 10 if I made use of list fusion, entirely eliminating all sources of allocation. I wonder if these factors carry over to HVM, too, if someone put in the work to implement a few foldr/build fusion primitives (which shouldn't be too hard). It's hard to tell, because as you say it obviously hasn't seen as much careful work put into it.

Note that compile-time foldr/fusion primitives are not required to have fusion in HVM. It happens at runtime. Now, compile-time inlining is a different thing, and, yes, aggressive inlining can result in faster programs, essentially because you moved work to compile-time. HVM, once mature enough, could optimize foldl' exactly like GHC. There is no reason not to. Simple proof: compile HVM to Haskell, compile Haskell to Core (with GHC), compile Core back to HVM. That is why benchmarking foldl' is kinda pointless. GHC is master of compile-time computation. HVM does none of that. It isn't a whole compiler. It is just the runtime. Technically it replaces STG, not GHC. In fact, compiling Haskell to HVM via GHC is one way to use it!

I would much rather see "This is the most efficient way in GHC Haskell to solve this problem, and this is the most efficient way to solve it as an HVM program".

I think you do not realize though that HVM operates basically identically to GHC until there are high-order lambdas involved. For programs like sum (range 0 n), optimal evaluation and lazy evaluation are the same, because this program makes no use of runtime closures. It can perfectly compile to a direct C loop, and both GHC and HVM can do that.

Does that make sense?

from hvm.

sgraf812 commented on August 16, 2024

HVM, once mature enough, could optimize foldl' exactly like GHC. Simple proof: compile HVM to Haskell, compile Haskell to Core (with GHC), compile Core back to HVM.

That's not a proof at all. But we agree on the property to assert! Note that GHC doesn't end with Core.

I think you do not realize though that HVM operates basically identically to GHC until there are high-order lambdas involved. For programs like sum (range 0 n), optimal evaluation and lazy evaluation are the same, because this program makes no use of runtime closures. It can perfectly compile to a direct C loop, and both GHC and HVM can do that.

sum (range 0 n) doesn't have any "higher-order lambdas" (not sure what that means); just regular lambdas. It is a recursive consumer sum called on a recursive producer range, the latter of which has to build up the whole data structure before it can do anything useful. At least in the way GHC compiles it; it makes no attempt to turn a tail-recursive function into a generative corecursion (simply because that has different perf trade-offs and is not always beneficial).

As I said, your benchmark capitalises on that by a factor of 3. Perhaps because HVM is able to do the necessary conversion somehow. Or, much more likely, it is faster because it doesn't have a tracing GC. Note the amount of garbage your benchmark produces

$ ./slow 10 +RTS -s
2280707264
     726,224,016 bytes allocated in the heap
   1,539,414,664 bytes copied during GC
     321,709,520 bytes maximum residency (10 sample(s))
      55,102,000 bytes maximum slop
             730 MiB total memory in use (0 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0       683 colls,     0 par    0.405s   0.406s     0.0006s    0.0015s
  Gen  1        10 colls,     0 par    0.567s   0.567s     0.0567s    0.2281s

  INIT    time    0.000s  (  0.000s elapsed)
  MUT     time    0.131s  (  0.130s elapsed)
  GC      time    0.973s  (  0.973s elapsed)
  EXIT    time    0.000s  (  0.000s elapsed)
  Total   time    1.104s  (  1.104s elapsed)

  %GC     time       0.0%  (0.0% elapsed)

  Alloc rate    5,550,184,596 bytes per MUT second

  Productivity  11.9% of total user, 11.8% of total elapsed

12% Productivity. Or in other words, 88% of the execution time is spent doing garbage collection! Your are merely benchmarking the garbage collector!

I agree that GHC is miles ahead in terms of possible optimisations. But note that the argument I gave is quite similar to your argument of "But GHC does so much more optimisations, it's unfair to judge HVM by comparing to a Haskell program that is written with performance in mind".

I don't think we'll get to an agreement in the argument above. But maybe we can agree on the following:

Given a faithful reproduction of optimised STG/Core of program X.hs produced by GHC in HVM code, how much faster/slower is the HVM code to execute compared to the GHC-compiled code?

And then in theory, we wouldn't care about whether X.hs would be written with perf in mind or not, right? Since perf is important, I'd like to write X.hs in a way that is particularly efficient and not dominated by allocations. Hence I pick

import System.Environment
import Data.List

-- Sums a big list with fold
main = do
  n <- read.head <$> getArgs :: IO Int
  let size = 1000000 * n
  let list = [0..size]
  print $ foldl' (+) 0 list

and see

/fast 10 +RTS -s
50000005000000
          57,288 bytes allocated in the heap
           3,312 bytes copied during GC
          44,408 bytes maximum residency (1 sample(s))
          25,224 bytes maximum slop
               2 MiB total memory in use (0 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0         0 colls,     0 par    0.000s   0.000s     0.0000s    0.0000s
  Gen  1         1 colls,     0 par    0.000s   0.000s     0.0002s    0.0002s

  INIT    time    0.000s  (  0.000s elapsed)
  MUT     time    0.012s  (  0.012s elapsed)
  GC      time    0.000s  (  0.000s elapsed)
  EXIT    time    0.000s  (  0.000s elapsed)
  Total   time    0.012s  (  0.012s elapsed)

  %GC     time       0.0%  (0.0% elapsed)

  Alloc rate    4,909,298 bytes per MUT second

  Productivity  95.3% of total user, 95.4% of total elapsed

which is 100 times as fast as the previous example and produces the correct 64 bit result. Now I want to take the inner loop there and translate it to HVM code. Just FTR, -O2 -ddump-final-stg produces something like the below for me.

Main.main1                                                                                                                                                                                                                                             
  :: GHC.Prim.State# GHC.Prim.RealWorld                                                                                                                                                                                                                
     -> (# GHC.Prim.State# GHC.Prim.RealWorld, () #)                                                                                                                                                                                                   
[GblId, Arity=1, Str=<L,U>, Unf=OtherCon []] =                                                                                                                                                                                                         
    \r [void_0E]                                                                                                                                                                                                                                       
        case                                                                                                                                                                                                                                           
            Foreign.Marshal.Alloc.allocaBytesAligned                                                                                                                                                                                                   
                Foreign.Storable.$fStorableBool7                                                                                                                                                                                                       
                Foreign.Storable.$fStorableBool7                                                                                                                                                                                                       
                System.Environment.getArgs1                                                                                                                                                                                                            
                GHC.Prim.void#                                                                                                                                                                                                                         
        of                                                                                                                                                                                                                                             
        {                                                                                                                                                                                                                                              
        Unit# ipv1_s390 [Occ=Once1!] ->                                                                                                                                                                                                                
        let {                                                                                                                                                                                                                                          
          sat_s39v [Occ=Once1, Dmd=<L,1*U>] :: GHC.Base.String                                                                                                                                                                                         
          [LclId] =                                                                                                                                                                                                                                    
              \s []                                                                                                                                                                                                                                    
                  let {                                                                                                                                                                                                                                
                    sat_s394 [Occ=Once1] :: GHC.Base.String                                                                                                                                                                                            
                    [LclId] =                                                                                                                                                                                                                          
                        \u []                                                                                                                                                                                                                          
                            case ipv1_s390 of {                                                                                                                                                                                                        
                              [] -> GHC.List.badHead;                                                                                                                                                                                                  
                              : x_s392 [Occ=Once1] _ [Occ=Dead] -> x_s392;                                                                                                                                                                             
                            };                                                                                                                                                                                                                         
                  } in                                                                                                                                                                                                                                 
                    case                                                                                                                                                                                                                               
                        Text.ParserCombinators.ReadP.run Main.main4 sat_s394                                                                                                                                                                           
                    of                                                                                                                                                                                                                                 
                    sat_s395 [Occ=Once1]                                                                                                                                                                                                               
                    {                                                                                                                                                                                                                                  
                    __DEFAULT ->                                                                                                                                                                                                                       
                    case Text.Read.readEither8 sat_s395 of {                                                                                                                                                                                           
                      [] -> Main.main3;                                                                                                                                                                                                                
                      : x_s398 [Occ=Once1!] ds1_s399 [Occ=Once1!] ->                                                                                                                                                                                   
                          case ds1_s399 of {                                                                                                                                                                                                           
                            [] ->                                                                                                                                                                                                                      
                                case x_s398 of {                                                                                                                                                                                                       
                                GHC.Types.I# y_s39c [Occ=Once1] ->                                                                                                                                                                                     
                                case *# [1000000# y_s39c] of y1_s39d {                                                                                                                                                                                 
                                __DEFAULT ->                                                                                                                                                                                                           
                                case ># [0# y1_s39d] of {                                                                                                                                                                                              
                                  __DEFAULT ->                                                                                                                                                                                                         
                                      let-no-escape {                                                                                                                                                                                                  
                                        Rec {                                                                                                                                                                                                          
                                        $wgo_s39f [InlPrag=NOUSERINLINE[2], Occ=LoopBreakerT[2]]                                                                                                                                                       
                                          :: GHC.Prim.Int# -> GHC.Prim.Int# -> GHC.Base.String                                                                                                                                                         
                                        [LclId[JoinId(2)],                                                                                                                                                                                             
                                         Arity=2,                                                                                                                                                                                                      
                                         Str=<L,U><L,U>,                                                                                                                                                                                               
                                         Unf=OtherCon []] =                                                                                                                                                                                            
                                            \r [w_s39g ww_s39h]                                                                                                                                                                                        
                                                case ==# [w_s39g y1_s39d] of {                                                                                                                                                                         
                                                  __DEFAULT ->                                                                                                                                                                                         
                                                      case                                                                                                                                                                                             
                                                          +# [ww_s39h w_s39g]                                                                                                                                                                          
                                                      of                
                                                      sat_s39k [Occ=Once1]
                                                      {
                                                      __DEFAULT ->
                                                      case +# [w_s39g 1#] of sat_s39j [Occ=Once1] {
                                                      __DEFAULT -> $wgo_s39f sat_s39j sat_s39k;
                                                      };
                                                      };
                                                  1# ->
                                                      case
                                                          +# [ww_s39h w_s39g]
                                                      of
                                                      sat_s39l [Occ=Once1]
                                                      {
                                                      __DEFAULT ->
                                                      case
                                                          GHC.Show.$wshowSignedInt
                                                              0# sat_s39l GHC.Types.[]
                                                      of
                                                      {
                                                      (#,#) ww5_s39n [Occ=Once1]
                                                            ww6_s39o [Occ=Once1] ->
                                                      : [ww5_s39n ww6_s39o];
                                                      };
                                                      };
                                                };
                                        end Rec }
                                      } in  $wgo_s39f 0# 0#;
                                  1# ->
                                      case GHC.Show.$wshowSignedInt 0# 0# GHC.Types.[] of {
                                      (#,#) ww5_s39q [Occ=Once1] ww6_s39r [Occ=Once1] ->
                                      : [ww5_s39q ww6_s39r];
                                      };
                                };
                                };
                                };
                            : _ [Occ=Dead] _ [Occ=Dead] -> Main.main2;
                          };
                    };
                    };
        } in 
          GHC.IO.Handle.Text.hPutStr'
              GHC.IO.Handle.FD.stdout sat_s39v GHC.Types.True GHC.Prim.void#;
        };

Which is quite close to the inner loop

(Go size i s) = if i == size then s else Go size (i+1) (s+i)

(Main n) =
  let size = (* n 1000000)
  (Go size 0 0)

Modulo a bit of unrolling that can't contribute more than a factor 2 of speedup in favor of GHC.

This kind of comparison is what I'm interested in. Now you might say "This is a toy benchmark!", but then so was the original one. Plus, I really don't want to give up on C level performance of tight loops (or at least a factor of 5 within) when I write such code.

Having numbers for this benchmark would say a lot to me. The benchmark from the main README, not so much.

from hvm.

VictorTaelin commented on August 16, 2024

Hello again! I missed this comment. To answer (if a little late):

This is actually the question that matters most to me: Is Beta optimality worth the losses? Will the losses get significantly smaller over time?

My argument is that there are no losses. There is nothing on HVM that makes it inherently slower than GHC. In some cases, today, it will be, because GHC has a constellation of micro-optimizations that HVM doesn't. Sometimes GHC will go as far as compiling a long loop to a constant, so obviously it will be faster. But that's all. One is a 3-decades-old compiler and runtime, improved by major unis and companies, the other is a new runtime written by a single dev. If you want a fair comparison, take a Haskell program, compile to GHC Core, and run it on both the HVM and on GHC's runtime. You'll notice they're very close together.

And it seems we just established that a simple counting loop will regress by a factor of ~50 when switching to a hypothetical HVM backend for GHC

No, that's not the case. Take GHC core and run it on HVM.

Besides, Haskell often has "fast non-allocating hot loops" such as repeated (Int)Map.lookup over the same map.

No, a suitable optimizer should replace that by a linear version, so there will be no dup at all. Kind should take care of that kind of situation.

I stand by my initial claim that the README suggests a false balance, namely that HVM

is almost on par with GHC's (STG to native code) backend for sequential workloads

faster when autoparallelisation is on

and even faster when it manages to make an asymptotic difference due to beta optimality

I do not suggest that. My claim is that HVM is getting closer and closer (but not almost on par) with GHC in general, and will be significantly faster than it soon (in some cases, already) for real applications due to parallelism and GC-freedom. If you see any sentence that suggests otherwise, please let me know.

from hvm.

"List Fold (Sequential)" Haskell benchmark isn't entirely faithful about hvm HOT 7 CLOSED

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent