Hello, I'm really interested in truffleruby and started to benchmark a simple use-

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

I wrote about why I rewrote your benchmark <a href="https://github.com/graalvm/truffle

Thanks <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-u

Bad Performance with bigdecimal about truffleruby HOT 6 CLOSED

oracle commented on July 29, 2024

Bad Performance with bigdecimal

from truffleruby.

Comments (6)

pitr-ch commented on July 29, 2024

Hi, thank you for your interest in TruffleRuby. How long does the benchmark run? TruffleRuby needs time to warm up. Please also make sure that Truffle::Graal.graal? returns true. TruffleRuby's BigDecimal is essentially an wrapper around Java's BigDecimal.

from truffleruby.

prdn commented on July 29, 2024

@pitr-ch it Truffle::Graal.graal? returns true.

Changing the code as below I get better results but still 492ms for Truffle and 92ms for CRuby.

require 'bigdecimal'

cnt = 0

while true do
  cc = BigDecimal(cnt) 
  cnt += 1
  break if cnt > 100000
end

cnt = 0

t1 = Time.now

while true do
  cc = BigDecimal(cnt) 
  cnt += 1
  break if cnt > 100000
end

puts (Time.now - t1) * 1000

from truffleruby.

chrisseaton commented on July 29, 2024

I rewrote your benchmark using benchmark-ips which is designed to accommodate optimising implementations of Ruby.

require 'bigdecimal'

require 'benchmark/ips'

Benchmark.ips do |x|
  
  x.iterations = 3
  
  x.report("bigdecimal") do
    cnt = 0
    while true do
      cc = BigDecimal(cnt)
      cnt += 1
      break if cnt > 100000
    end
  end
  
end

MRI 2.3.3: 4.276k (±12.8%) i/s
Rubinius 3.60: 2.554 (±78.3%) i/s
JRuby 9.1.6.0: 11.310 (± 8.8%) i/s
GraalVM 0.19: crashes :)

So we have a bug in compiling BigDecimal code, which we need to fix. And also the fact that you weren't seeing the bug means you weren't triggering compilation with that benchmark, which is why I used benchmark-ips. Also note that JRuby is slow with BigDecimal. I think Java's BigDecimal is a bit slow, and we use that as well, so I don't expect TruffleRuby to be much faster either. I'm not sure what is wrong with Rubinius.

BigDecimal is not something that is going to be much faster on TruffleRuby. All Ruby implementations will use a system version of BigDecimal that is probably already native code that is optimised well. There isn't much actual Ruby code here to be optimised outside of that native code.

from truffleruby.

chrisseaton commented on July 29, 2024

I wrote about why I rewrote your benchmark https://github.com/graalvm/truffleruby/blob/truffle-head/doc/user/reporting-performance-problems.md.

from truffleruby.

prdn commented on July 29, 2024

Thanks @chrisseaton

from truffleruby.

chrisseaton commented on July 29, 2024

This looks from experience like it's going to be a relatively simple problem to fix, so for interest I thought I'd document how I fix issues like this in TruffleRuby.

Having rewritten the benchmark to use benchmark-ips the code is now attempted to be compiled by Graal and we see errors being reported. The errors don't stop the program, because an error in the compiler doesn't mean the program can't continue. We'd like to stop when the error occurs though, so I use the Graal option -J-Dgraal.TruffleCompilationExceptionsAreFatal=true. Now I see the error and the program stops.

I then switched away from the GraalVM and to a development repository of TruffleRuby with a build of latest graal-core. The erorr is still there.

The error output is really verbose and appears like a kind of stack trace. I'll read it from the bottom up and I see the stack trace from the compiler at the bottom there.

...
org.graalvm.compiler.truffle.OptimizedCallTarget.callRoot(OptimizedCallTarget.java:214)
	at org.graalvm.compiler.replacements.PEGraphDecoder.tooDeepInlining(PEGraphDecoder.java:729)
	at org.graalvm.compiler.replacements.PEGraphDecoder.doInline(PEGraphDecoder.java:585)
	at org.graalvm.compiler.replacements.PEGraphDecoder.tryInline(PEGraphDecoder.java:570)
	at org.graalvm.compiler.replacements.PEGraphDecoder.trySimplifyInvoke(PEGraphDecoder.java:490)
	at org.graalvm.compiler.replacements.PEGraphDecoder.handleInvoke(PEGraphDecoder.java:464)
	at org.graalvm.compiler.nodes.GraphDecoder.processNextNode(GraphDecoder.java:550)
	at org.graalvm.compiler.nodes.GraphDecoder.decode(GraphDecoder.java:393)
	at org.graalvm.compiler.replacements.PEGraphDecoder.decode(PEGraphDecoder.java:398)
...

The error is too deep inlining in the partial evaluator graph decode phase. The partial evaluator combines the runtime data structure that is the AST of this benchmark with the code that is the interpreter implementation methods written in Java, and partially evaluates (executes) as much of it as it can, leaving only the parts that depend on runtime data, which it will then feed into the rest of the Graal compiler. The graph decoder is the part that takes a Java program in bytecode and produces a Graal graph, on which the partial evaluator algorithm runs.

Looking further up, I see what looks like a kind of stack overflow, or infinite recursion.

java.math.BigInteger.square(BigInteger.java:1899)
java.math.BigInteger.squareToomCook3(BigInteger.java:2049)
java.math.BigInteger.square(BigInteger.java:1899)
java.math.BigInteger.squareToomCook3(BigInteger.java:2049)
java.math.BigInteger.square(BigInteger.java:1899)
java.math.BigInteger.squareToomCook3(BigInteger.java:2049)
java.math.BigInteger.square(BigInteger.java:1899)
java.math.BigInteger.multiply(BigInteger.java:1491)
java.math.BigInteger.pow(BigInteger.java:2302)
java.math.BigDecimal.bigTenToThe(BigDecimal.java:3543)
java.math.BigDecimal.bigDigitLength(BigDecimal.java:3820)
java.math.BigDecimal.precision(BigDecimal.java:2240)
java.math.BigDecimal.doRound(BigDecimal.java:3988)
java.math.BigDecimal.plus(BigDecimal.java:2195)
org.truffleruby.stdlib.bigdecimal.CreateBigDecimalNode.create(CreateBigDecimalNode.java:83)

The partial evaluator inlines all Java methods. This is essential because the AST interpreter is comprised of lots of little methods and if we didn't we wouldn't see many optimisation opportunities. To prevent inlining you use an annotation, called @TruffleBoundary.

If your program recurses infinitely, Truffle will try to inline all those calls, and eventually it'll run out of memory (or actually it'll realise it's going too far and stop, which is what has happened here).

I look at where the recursion starts, and look at the last bit of code that we wrote, which is this CreateBigDecimalNode.create. Looking at this there is some complexity here - creating a BigDecimal appears to involve reading some kind of global state in the mode variable. This isn't unusual for Ruby.

https://github.com/graalvm/truffleruby/blob/683ade0ee4a7dee4bfc234d1cd95aad76b286f94/truffle/src/main/java/org/truffleruby/stdlib/bigdecimal/CreateBigDecimalNode.java#L85-L85

The stack trace shows that there is a call to BigDecimal.plus, but looking at the Java code there isn't. Sometimes the partial evaluator can't report exact source information. I'm not sure why. But I can see a call to BigDecimal.round, which I know calls BigDecimal.plus.

It looks like BigDecimal.plus is just not code that it makes sense to partially evaluate, since it has this recursion that is not statically knowable (or dynamically knowable with the profiling information we have) to be bounded. Rounding logic is often complex. The first thing I'll try is adding one of those @TruffleBoundary annotations to prevent the partial evaluator inlining this call to round (and so to plus).

That works. No more errors. There are likely some other errors in BigDecimal code like this because we clearly haven't exercised BigDecimal in the compiler much. If you try another BigDecimal benchmark you might see another similar error. We have a systematic way to fix these - we can run the specs with the compiler set to be very aggressive and always compiling things, but I don't have time right now to really dig into BigDecimal.

I was running the benchmark incorrectly last time (I left a |times| parameter in the benchmark-ips block, which makes it behave differently, but then I wasn't actually using that parameter), so here's some fresh results.

TruffleRuby:

Warming up --------------------------------------
          bigdecimal     1.000  i/100ms
          bigdecimal     1.000  i/100ms
          bigdecimal     6.000  i/100ms
Calculating -------------------------------------
          bigdecimal     64.757  (±17.0%) i/s -    312.000  in   5.044669s
          bigdecimal     65.775  (±13.7%) i/s -    324.000  in   5.054271s
          bigdecimal     69.920  (± 2.9%) i/s -    354.000  in   5.067837s

MRI 2.4.0:

Warming up --------------------------------------
          bigdecimal     1.000  i/100ms
          bigdecimal     1.000  i/100ms
          bigdecimal     1.000  i/100ms
Calculating -------------------------------------
          bigdecimal     13.369  (± 7.5%) i/s -     67.000  in   5.022234s
          bigdecimal     13.166  (± 0.0%) i/s -     66.000  in   5.018765s
          bigdecimal     13.580  (± 7.4%) i/s -     68.000  in   5.016456s

JRuby 9.1.6.0:

Warming up --------------------------------------
          bigdecimal    20.000  i/100ms
          bigdecimal    23.000  i/100ms
          bigdecimal    22.000  i/100ms
Calculating -------------------------------------
          bigdecimal    227.979  (± 3.1%) i/s -      1.144k in   5.023113s
          bigdecimal    221.881  (± 4.1%) i/s -      1.122k in   5.064217s
          bigdecimal    226.781  (± 4.4%) i/s -      1.144k in   5.053341s

Rubinius 3.60:

Warming up --------------------------------------
          bigdecimal     1.000  i/100ms
          bigdecimal     1.000  i/100ms
          bigdecimal     1.000  i/100ms
Calculating -------------------------------------
          bigdecimal      2.082  (± 0.0%) i/s -     10.000 
          bigdecimal      2.562  (±78.1%) i/s -     10.000 
          bigdecimal      2.931  (±68.2%) i/s -     10.000

So relative to MRI, Rubinius is 0.2x slower, JRuby is a whopping 16.7x faster, and TruffleRuby is 5x faster. So we still have some work to do there - there's no reason we should be any slower than JRuby - but as I say I don't have time now to work on it for the sake of it. If you give me more BigDecimal issues which you actually encounter and actually want fixed I'll fix them though. I'm not sure what is wrong with Rubinius. You could open an issue for that if you wanted, as I think they want issues for when they're slower than MRI.

Thanks for the issue! Fix is in 26e6a40, but it might not make it into GraalVM 0.20 as it's very close to the release.

from truffleruby.

Bad Performance with bigdecimal about truffleruby HOT 6 CLOSED

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent