Comments (4)
We got interesting results!
I used GraalVM 0.26 from OTN and MRI 2.4.1.
The first change I made to the benchmark is to allow it to run for longer, by adding
x.iterations = 5
to both Benchmark.ips blocks.
This runs each benchmark five times consecutively, which allow results to stabilize.
Let's run each Benchmark.ips group one by one.
Otherwise it would mean in the second benchmark we run a compiled version of
Array#to_proc that is the combination of the usage in the first and the second benchmark.
To do that, I just commented the Benchmark.ips group to not run.
First benchmark: include?
MRI:
$ ruby -v
ruby 2.4.1p111 (2017-03-22 revision 58053) [x86_64-linux]
$ ruby bench_array_to_proc_iters.rb
Warming up --------------------------------------
#to_proc include? 1.000 i/100ms
block include? 1.000 i/100ms
#to_proc include? 1.000 i/100ms
block include? 1.000 i/100ms
#to_proc include? 1.000 i/100ms
block include? 1.000 i/100ms
#to_proc include? 1.000 i/100ms
block include? 1.000 i/100ms
#to_proc include? 1.000 i/100ms
block include? 1.000 i/100ms
Calculating -------------------------------------
#to_proc include? 7.063 (± 0.0%) i/s - 36.000 in 5.097487s
block include? 8.553 (± 0.0%) i/s - 43.000 in 5.027790s
#to_proc include? 7.028 (± 0.0%) i/s - 36.000 in 5.122740s
block include? 8.500 (± 0.0%) i/s - 43.000 in 5.061580s
#to_proc include? 7.036 (± 0.0%) i/s - 36.000 in 5.117119s
block include? 8.527 (± 0.0%) i/s - 43.000 in 5.043421s
#to_proc include? 7.022 (± 0.0%) i/s - 36.000 in 5.128119s
block include? 8.501 (± 0.0%) i/s - 43.000 in 5.060190s
#to_proc include? 7.030 (± 0.0%) i/s - 36.000 in 5.121864s
block include? 8.498 (± 0.0%) i/s - 43.000 in 5.062551s
Comparison:
block include?: 8.5 i/s
#to_proc include?: 7.0 i/s - 1.21x slower
So that's like your results, there is a significant overhead on MRI.
$ ~/code/graalvm-0.26/bin/ruby -v
truffleruby 0.26, like ruby 2.3.3 <OpenJDK 64-Bit Graal VM 1.8.0_121-b13 with Graal> [linux-x86_64]
# I reuse the benchmark-ips installed by MRI for convenience here
# One could also do ~/code/graalvm-0.26/bin/ruby -S gem install benchmark-ips,
# but for that TruffleRuby needs to be integrated with a Ruby manager
# (or use TRUFFLERUBY_RESILIENT_GEM_HOME=true).
$ ~/code/graalvm-0.26/bin/ruby -I~/.gem/ruby/2.4.1/gems/benchmark-ips-2.7.2/lib bench_array_to_proc_iters.rb
Warming up --------------------------------------
#to_proc include? 8.000 i/100ms
block include? 11.000 i/100ms
#to_proc include? 15.000 i/100ms
block include? 13.000 i/100ms
#to_proc include? 33.000 i/100ms
block include? 21.000 i/100ms
#to_proc include? 40.000 i/100ms
block include? 26.000 i/100ms
#to_proc include? 40.000 i/100ms
block include? 26.000 i/100ms
Calculating -------------------------------------
#to_proc include? 179.213 (±11.2%) i/s - 880.000 in 5.011706s
block include? 254.029 (±23.6%) i/s - 1.144k in 5.024343s
#to_proc include? 452.736 (± 2.0%) i/s - 2.280k in 5.038072s
block include? 282.178 (± 1.8%) i/s - 1.430k in 5.069233s
#to_proc include? 453.829 (± 1.1%) i/s - 2.280k in 5.024652s
block include? 283.608 (± 0.7%) i/s - 1.430k in 5.042473s
#to_proc include? 454.198 (± 1.1%) i/s - 2.280k in 5.020536s
block include? 282.818 (± 1.1%) i/s - 1.430k in 5.057029s
#to_proc include? 455.261 (± 0.9%) i/s - 2.280k in 5.008603s
block include? 273.327 (± 5.1%) i/s - 1.378k in 5.055163s
Comparison:
#to_proc include?: 455.3 i/s
block include?: 273.3 i/s - 1.67x slower
First benchmark: instead of (#to_proc, block) 7/8.5 iterations/s, TruffleRuby runs 455/273 iterations/s.
That's a 65x/32x speedup over MRI!
But wait, #to_proc include?
is actually 1.67x faster than the block version?
Actually there is a reason for that!
The benchmark reads the variable range
captured 2 lexical scopes above in the inner loop:
array_of_numbers = 1..1_000_000
range = 20..40_000
Benchmark.ips do |x|
x.iterations = 5
x.report '#to_proc include?' do
array_of_numbers.map &[range, :include?]
end
x.report 'block include?' do
array_of_numbers.map { |number| range.include? number }
end
end
But with Array#to_proc we actually read it only once per iteration,
and then read it from the lambda in #to_proc which folds away.
To be more fair, we can store range
in a constant MY_RANGE
:
~/code/graalvm-0.26/bin/ruby -I~/.gem/ruby/2.4.1/gems/benchmark-ips-2.7.2/lib bench_array_to_proc_iters_const.rb
Warming up --------------------------------------
#to_proc include? 8.000 i/100ms
block include? 22.000 i/100ms
#to_proc include? 15.000 i/100ms
block include? 27.000 i/100ms
#to_proc include? 34.000 i/100ms
block include? 31.000 i/100ms
#to_proc include? 77.000 i/100ms
block include? 77.000 i/100ms
#to_proc include? 77.000 i/100ms
block include? 77.000 i/100ms
Calculating -------------------------------------
#to_proc include? 176.797 (± 9.6%) i/s - 924.000 in 5.292224s
block include? 726.082 (±18.5%) i/s - 3.311k in 5.038979s
#to_proc include? 773.143 (± 1.9%) i/s - 3.927k in 5.081265s
block include? 774.650 (± 2.2%) i/s - 3.927k in 5.071775s
#to_proc include? 775.459 (± 1.5%) i/s - 3.927k in 5.065373s
block include? 750.786 (± 3.2%) i/s - 3.773k in 5.030590s
#to_proc include? 749.608 (± 2.5%) i/s - 3.773k in 5.036557s
block include? 749.925 (± 3.9%) i/s - 3.773k in 5.039326s
#to_proc include? 760.790 (± 3.2%) i/s - 3.850k in 5.065920s
block include? 767.078 (± 2.7%) i/s - 3.850k in 5.023114s
Comparison:
block include?: 767.1 i/s
#to_proc include?: 760.8 i/s - same-ish: difference falls within error
Now there is no significant difference between the two versions, and we are
almost 100x faster than MRI (changing to a constant did not change MRI's score).
Second benchmark: append
This benchmark has the same problem as the one above,
array_to_append
should be a constant (ARRAY_TO_APPEND
) to be fair.
So we change the benchmark like that.
$ ruby bench_array_to_proc_iters_const.rb
Warming up --------------------------------------
#to_proc append 81.937k i/100ms
block append 152.283k i/100ms
#to_proc append 80.402k i/100ms
block append 142.596k i/100ms
#to_proc append 78.866k i/100ms
block append 149.598k i/100ms
#to_proc append 78.686k i/100ms
block append 149.088k i/100ms
#to_proc append 76.223k i/100ms
block append 147.299k i/100ms
Calculating -------------------------------------
#to_proc append 926.243k (± 4.1%) i/s - 4.650M in 5.028747s
block append 1.952M (± 6.7%) i/s - 9.722M in 5.003298s
#to_proc append 921.311k (± 5.1%) i/s - 4.650M in 5.060526s
block append 2.008M (± 4.7%) i/s - 10.016M in 5.000673s
#to_proc append 942.812k (± 3.5%) i/s - 4.726M in 5.019020s
block append 1.971M (± 4.8%) i/s - 9.869M in 5.018637s
#to_proc append 883.998k (± 6.9%) i/s - 4.421M in 5.026459s
block append 2.019M (± 3.5%) i/s - 10.164M in 5.040880s
#to_proc append 911.273k (± 4.4%) i/s - 4.573M in 5.028715s
block append 1.982M (± 5.3%) i/s - 10.016M in 5.070194s
Comparison:
block append: 1981502.9 i/s
#to_proc append: 911273.3 i/s - 2.17x slower
That's a pretty big overhead.
$ ~/code/graalvm-0.26/bin/ruby -I~/.gem/ruby/2.4.1/gems/benchmark-ips-2.7.2/lib bench_array_to_proc_iters_const.rb
Warming up --------------------------------------
#to_proc append 637.580k i/100ms
block append 780.447k i/100ms
#to_proc append 749.302k i/100ms
block append 877.479k i/100ms
#to_proc append 1.042M i/100ms
block append 1.066M i/100ms
#to_proc append 977.468k i/100ms
block append 1.095M i/100ms
#to_proc append 1.093M i/100ms
block append 1.101M i/100ms
Calculating -------------------------------------
#to_proc append 14.166M (±14.7%) i/s - 65.610M in 5.034431s
block append 14.582M (±10.2%) i/s - 70.452M in 5.049298s
#to_proc append 14.632M (± 2.5%) i/s - 73.264M in 5.010349s
block append 14.578M (± 4.0%) i/s - 73.754M in 5.068358s
#to_proc append 14.304M (± 4.0%) i/s - 72.170M in 5.054003s
block append 14.819M (± 1.8%) i/s - 74.855M in 5.053035s
#to_proc append 14.692M (± 2.3%) i/s - 74.357M in 5.064026s
block append 14.861M (± 1.5%) i/s - 74.855M in 5.038001s
#to_proc append 14.744M (± 1.5%) i/s - 74.357M in 5.044541s
block append 14.876M (± 1.5%) i/s - 74.855M in 5.033147s
Comparison:
block append: 14876012.3 i/s
#to_proc append: 14743577.5 i/s - same-ish: difference falls within error
No significant difference. Here again Array#to_proc has no overhead!
And TruffleRuby is (block, #to_proc) 7.5x/16x faster than MRI.
The speedup is less, because this benchmark needs to allocate a lot of new arrays.
The benchmark files I used are in this gist.
from truffleruby.
Hello,
This looks like it should optimize very well under TruffleRuby, if the method containing the call to to_proc
is compiled.
- Partial Escape Analysis should see through the literal arrays you convert to a Proc and so constant fold
Array#[]
,Array#empty?
, etc. .is_a?
should fold, andsend
as well.- The *splats should optimize well as these arrays are escape-analyzed as well and are small.
I would propose as the first step to write a benchmark of a few examples you care about like the ones above using benchmark/ips
. Then we can assess the current performance, and more easily dig into potential issues.
from truffleruby.
Thank you. Here is a benchmark. The results on MRI 2.4:
Warming up --------------------------------------
Array#to_proc 1.000 i/100ms
default 1.000 i/100ms
Calculating -------------------------------------
Array#to_proc 2.676 (± 0.0%) i/s - 14.000 in 5.233228s
default 3.301 (± 0.0%) i/s - 17.000 in 5.151718s
Comparison:
default: 3.3 i/s
Array#to_proc: 2.7 i/s - 1.23x slower
Warming up --------------------------------------
Array#to_proc 25.267k i/100ms
default 48.108k i/100ms
Calculating -------------------------------------
Array#to_proc 309.345k (± 0.6%) i/s - 1.567M in 5.064304s
default 708.505k (± 0.9%) i/s - 3.560M in 5.025035s
Comparison:
default: 708504.6 i/s
Array#to_proc: 309344.9 i/s - 2.29x slower
from truffleruby.
Thank you. I really appreciate it. Very informative and educational.
from truffleruby.
Related Issues (20)
- Rare internal exception loading `google-protobuf` HOT 5
- dead handle during json test suite HOT 2
- dead handle in nokogiri test suite HOT 1
- Regression when `IO.wait_*` or `rb_io_wait` is interrupted by `Thread#kill` HOT 1
- Truffleruby platform mismatch for Gemfile HOT 4
- The buffer encoding should remain unchanged after read_nonblock(N, buffer) HOT 1
- Parsing floats fails when using locales with a decimal separator different than `.`
- Unable to install truffleruby+graalvm-21.3.0 & 21.0.0 HOT 5
- `jt test fast` fails with JVM CE environment HOT 4
- Excessive splitting with `Method#to_proc` HOT 4
- Failure in SimpleCov test suite HOT 8
- Emit a performance warning when redefining methods in CoreMethodAssumptions HOT 1
- Serious performance regression for method pow(a, m) HOT 7
- Prepending a module to Integer disables many Inlined*Node
- method_source compatibility problems introduced with TruffleRuby 24.0.0 HOT 3
- Error installing pg 1.1.4 HOT 1
- concurrent-ruby Fixed Thread Pool memory leak HOT 2
- Array#pack does not support :buffer kwarg HOT 1
- Monkey patching not working HOT 2
- TruffleRuby set `host_cpu` to `aarch64` on `arm64-darwin` causing `REUSE_AS_BINARY_ON_TRUFFLERUBY` to not work as expected HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from truffleruby.