Question: Performance difference between MRI Ruby and jRuby

Question

Performance difference between MRI Ruby and jRuby

Answers 2
Added at 2016-11-10 14:11
Tags
Question

While doing some benchmarking to answer this question about the fastest way to concatenate arrays I was surprised that when I did the same benchmarks in with jRuby the tests were a lot slower.

Does this mean that the old adagio about jRuby being faster than MRI Ruby is gone ? Or is this about how arrays are treated in jRuby ?

Here the benchmark and the results in both MRI Ruby 2.3.0 and jRuby 9.1.2.0 Both run on a 64bit Windows 7 box, all 4 processors busy for 50-60%, memory in use ± 5.5GB. The jRuby had to be started with the parameter -J-Xmx1500M to provide enough heap space. I had to remove the test with push because of stack level too deep and also removed the slowest methods to not make the tests too long. Used Jave runtime: 1.7.0_21

require 'Benchmark'
N = 100

class Array
  def concat_all 
    self.reduce([], :+)
  end
end

# small arrays
a = (1..10).to_a
b = (11..20).to_a
c = (21..30).to_a

Benchmark.bm do |r|
  r.report('plus       ')  { N.times { a + b + c }}
  r.report('concat     ') { N.times { [].concat(a).concat(b).concat(c) }}
  r.report('splash     ') { N.times {[*a, *b, *c]} }
  r.report('concat_all ')  { N.times { [a, b, c].concat_all }}
  r.report('flat_map   ') { N.times {[a, b, c].flat_map(&:itself)} }
end

#large arrays
a = (1..10_000_000).to_a
b = (10_000_001..20_000_000).to_a
c = (20_000_001..30_000_000).to_a

Benchmark.bm do |r|
  r.report('plus       ')  { N.times { a + b + c }}
  r.report('concat     ') { N.times { [].concat(a).concat(b).concat(c) }}
  r.report('splash     ') { N.times {[*a, *b, *c]} }
  r.report('concat_all ')  { N.times { [a, b, c].concat_all }}
  r.report('flat_map   ') { N.times {[a, b, c].flat_map(&:itself)} }
end

This question is not about the different methods used, see the original question for that. In both situations MRI is 7 times faster ! Can someone exlain me why ?

C:\Users\...>d:\jruby\bin\jruby -J-Xmx1500M concat3.rb
       user     system      total        real
plus         0.000000   0.000000   0.000000 (  0.000946)
concat       0.000000   0.000000   0.000000 (  0.001436)
splash       0.000000   0.000000   0.000000 (  0.001456)
concat_all   0.000000   0.000000   0.000000 (  0.002177)
flat_map  0.010000   0.000000   0.010000 (  0.003179)
       user     system      total        real
plus       140.166000   0.000000 140.166000 (140.158687)
concat     143.475000   0.000000 143.475000 (143.473786)
splash     139.408000   0.000000 139.408000 (139.406671)
concat_all 144.475000   0.000000 144.475000 (144.474436)
flat_map143.519000   0.000000 143.519000 (143.517636)

C:\Users\...>ruby concat3.rb
       user     system      total        real
plus         0.000000   0.000000   0.000000 (  0.000074)
concat       0.000000   0.000000   0.000000 (  0.000065)
splash       0.000000   0.000000   0.000000 (  0.000098)
concat_all   0.000000   0.000000   0.000000 (  0.000141)
flat_map     0.000000   0.000000   0.000000 (  0.000122)
       user     system      total        real
plus        15.226000   6.723000  21.949000 ( 21.958854)
concat      11.700000   9.142000  20.842000 ( 20.928087)
splash      21.247000  12.589000  33.836000 ( 33.933170)
concat_all  14.508000   8.315000  22.823000 ( 22.871641)
flat_map    11.170000   8.923000  20.093000 ( 20.170945)
Answers
nr: #1 dodano: 2016-11-11 10:11

general rule is (as mentioned in the comments) that JRuby/JVM needs warmup.

usually bmbm is good fit, although TIMES=1000 should be increased (at least for the small array cases), also 1.5G might be not enough for optimal performance of JRuby (noticed a considerable change in numbers going from -Xmx2g to -Xmx3g). here's the results :

ruby 2.3.1p112 (2016-04-26 revision 54768) [x86_64-linux]

$ ruby concat3.rb
Rehearsal -----------------------------------------------
plus          0.000000   0.000000   0.000000 (  0.000076)
concat        0.000000   0.000000   0.000000 (  0.000070)
splash        0.000000   0.000000   0.000000 (  0.000099)
concat_all    0.000000   0.000000   0.000000 (  0.000136)
flat_map      0.000000   0.000000   0.000000 (  0.000138)
-------------------------------------- total: 0.000000sec

                  user     system      total        real
plus          0.000000   0.000000   0.000000 (  0.000051)
concat        0.000000   0.000000   0.000000 (  0.000059)
splash        0.000000   0.000000   0.000000 (  0.000083)
concat_all    0.000000   0.000000   0.000000 (  0.000120)
flat_map      0.000000   0.000000   0.000000 (  0.000173)
Rehearsal -----------------------------------------------
plus         43.040000   3.320000  46.360000 ( 46.351004)
concat       15.080000   3.870000  18.950000 ( 19.228059)
splash       49.680000   4.820000  54.500000 ( 54.587707)
concat_all   51.840000   5.260000  57.100000 ( 57.114867)
flat_map     17.380000   5.340000  22.720000 ( 22.716987)
------------------------------------ total: 199.630000sec

                  user     system      total        real
plus         42.880000   3.600000  46.480000 ( 46.506013)
concat       17.230000   5.290000  22.520000 ( 22.890809)
splash       60.300000   7.480000  67.780000 ( 67.878534)
concat_all   54.910000   6.480000  61.390000 ( 61.404383)
flat_map     17.310000   5.570000  22.880000 ( 23.223789)

...

jruby 9.1.6.0 (2.3.1) 2016-11-09 0150a76 Java HotSpot(TM) 64-Bit Server VM 25.112-b15 on 1.8.0_112-b15 +jit [linux-x86_64]

$ jruby -J-Xmx3g concat3.rb
Rehearsal -----------------------------------------------
plus          0.010000   0.000000   0.010000 (  0.001445)
concat        0.000000   0.000000   0.000000 (  0.002534)
splash        0.000000   0.000000   0.000000 (  0.001791)
concat_all    0.000000   0.000000   0.000000 (  0.002513)
flat_map      0.010000   0.000000   0.010000 (  0.007088)
-------------------------------------- total: 0.020000sec

                  user     system      total        real
plus          0.010000   0.000000   0.010000 (  0.002700)
concat        0.000000   0.000000   0.000000 (  0.001085)
splash        0.000000   0.000000   0.000000 (  0.001569)
concat_all    0.000000   0.000000   0.000000 (  0.003052)
flat_map      0.000000   0.000000   0.000000 (  0.002252)
Rehearsal -----------------------------------------------
plus         32.410000   0.670000  33.080000 ( 17.385688)
concat       18.610000   0.060000  18.670000 ( 11.206419)
splash       57.770000   0.330000  58.100000 ( 25.366032)
concat_all   19.100000   0.030000  19.130000 ( 13.747319)
flat_map     16.160000   0.040000  16.200000 ( 10.534130)
------------------------------------ total: 145.180000sec

                  user     system      total        real
plus         16.060000   0.040000  16.100000 ( 11.737483)
concat       15.950000   0.030000  15.980000 ( 10.480468)
splash       47.870000   0.130000  48.000000 ( 22.668069)
concat_all   19.150000   0.030000  19.180000 ( 13.934314)
flat_map     16.850000   0.020000  16.870000 ( 10.862716)

... so it seems like the opposite - MRI 2.3 gets 2-5x slower than JRuby 9.1

cat concat3.rb
require 'benchmark'
N = (ENV['TIMES'] || 100).to_i

class Array
  def concat_all
    self.reduce([], :+)
  end
end

# small arrays
a = (1..10).to_a
b = (11..20).to_a
c = (21..30).to_a

Benchmark.bmbm do |r|
  r.report('plus       ')  { N.times { a + b + c }}
  r.report('concat     ') { N.times { [].concat(a).concat(b).concat(c) }}
  r.report('splash     ') { N.times {[*a, *b, *c]} }
  r.report('concat_all ')  { N.times { [a, b, c].concat_all }}
  r.report('flat_map   ') { N.times {[a, b, c].flat_map(&:itself)} }
end

#large arrays
a = (1..10_000_000).to_a
b = (10_000_001..20_000_000).to_a
c = (20_000_001..30_000_000).to_a

Benchmark.bmbm do |r|
  r.report('plus       ')  { N.times { a + b + c }}
  r.report('concat     ') { N.times { [].concat(a).concat(b).concat(c) }}
  r.report('splash     ') { N.times {[*a, *b, *c]} }
  r.report('concat_all ')  { N.times { [a, b, c].concat_all }}
  r.report('flat_map   ') { N.times {[a, b, c].flat_map(&:itself)} }
end
nr: #2 dodano: 2016-11-11 22:11

What I have learned from these comments and answers and the tests I did myself afterward..

  • the OS probably makes a difference, I would have liked more answers in different situations so here I'm just guessing
  • the fastest method differs between runtime, MRI or jRuby, 32 of 64bit, JRE, so making claims that that method is beter than that other one is difficult, on my sysrtem the plus method was fastest in almost all circumstances but I didin't use Java HotSpot like kares
  • in 64 bit jRuby you can specify a much higher heap than in 32 bit (1.5G on my system), in 64 bit I coult use more heap than I have memory (a bug somewhere ?)
  • higher heaps speed up operations using much memory like the huge arrays I used
  • use the latest Java runtime, speed is better
  • jRuby needs a warmup, a methods needs to run a number of times before compiled, so use .bm and .bmbm with different repeat values to find that margin
  • Sometimes MRI is faster but with the right parameters and warmup jRuby was 3 to 3.5 times as fast on my system for this particular test

The last, together with the loading of the JVM makes MRI better for short ad hoc scripts, jRuby better for process hungry, longer running processes with methods repeated often, so jRuby would be better for running servers and services.

What I saw confirmed: do your own benchmarks for long or repeated processes. Both implementations have made big improvements in speed compared to earlier versions, let's not forget: Ruby may be a slower runner but a faster developer and if you compare the cost of some extra hardware to some extra developers...

Thanks to all the commenters and karen for their expertise.

Source Show
◀ Wstecz