Question: Why is sum so much faster than inject(:+)?

Question

Why is sum so much faster than inject(:+)?

Answers 1
Added at 2017-01-03 18:01
Tags
Question

So I was running some benchmarks in Ruby 2.4.0 and realized that

(1...1000000000000000000000000000000).sum

calculates immediately whereas

(1...1000000000000000000000000000000).inject(:+)

takes so long that I just aborted the operation. I was under the impression that Range#sum was an alias for Range#inject(:+) but it seems like that is not true. So how does sum work, and why is it so much faster than inject(:+)?

N.B. The documentation for Enumerable#sum (which is implemented by Range) does not say anything about lazy evaluation or anything along those lines.

Answers to

Why is sum so much faster than inject(:+)?

nr: #1 dodano: 2017-01-03 19:01

Short answer

For an integer range :

  • Enumerable#sum returns (range.max-range.min+1)*(range.max+range.min)/2
  • Enumerable#inject(:+) iterates over every element.

Theory

The sum of integers between 1 and n is called a triangular number, and is equal to n*(n+1)/2.

The sum of integers between n and m is the triangular number of m minus the triangular number of n-1, which is equal to m*(m+1)/2-n*(n-1)/2, and can be written (m-n+1)*(m+n)/2.

Enumerable#sum in Ruby 2.4

This property in used in Enumerable#sum for integer ranges :

if (RTEST(rb_range_values(obj, &beg, &end, &excl))) {
    if (!memo.block_given && !memo.float_value &&
            (FIXNUM_P(beg) || RB_TYPE_P(beg, T_BIGNUM)) &&
            (FIXNUM_P(end) || RB_TYPE_P(end, T_BIGNUM))) { 
        return int_range_sum(beg, end, excl, memo.v);
    } 
}

int_range_sum looks like this :

VALUE a;
a = rb_int_plus(rb_int_minus(end, beg), LONG2FIX(1));
a = rb_int_mul(a, rb_int_plus(end, beg));
a = rb_int_idiv(a, LONG2FIX(2));
return rb_int_plus(init, a);

which is equivalent to:

(range.max-range.min+1)*(range.max+range.min)/2

the aforementioned equality!

Complexity

Thanks a lot to @k_g and @Hynek-Pichi-Vychodil for this part!

sum

(1...1000000000000000000000000000000).sum requires three additions, a multiplication, a substraction and a division.

It's a constant number of operations, but multiplication is O((log n)²), so Enumerable#sum is O((log n)²) for an integer range.

inject

(1...1000000000000000000000000000000).inject(:+)

requires 999999999999999999999999999998 additions!

Addition is O(log n), so Enumerable#inject is O(n log n).

With 1E30 as input, inject with never return. The sun will explode long before!

Test

It's easy to check if Ruby Integers are being added :

module AdditionInspector
  def +(b)
    puts "Calculating #{self}+#{b}"
    super
  end
end

class Integer
  prepend AdditionInspector
end

puts (1..5).sum
#=> 15

puts (1..5).inject(:+)
# Calculating 1+2
# Calculating 3+3
# Calculating 6+4
# Calculating 10+5
#=> 15

Indeed, from enum.c comments :

Enumerable#sum method may not respect method redefinition of "+" methods such as Integer#+.

Source Show
◀ Wstecz