Question: Unexpected performance of a single "for" loop in Rcpp

Question

Unexpected performance of a single "for" loop in Rcpp

Answers 1
Added at 2016-12-31 16:12
Tags
Question

Can somebody explain to me somewhat odd behavior of simple for loop written using Rcpp (code is below). Based on microbenchmark output it seems that the algorithmic complexity of the for_iteration is constant which is not true based on its code. For comparison I tested function for_double_iteration and its behavior is consistent with its code complexity. This code was run on Ubuntu 16.04 and CPU Intel Core i3-6100 but the same result was obtained on Windows 7 and CPU Intel Core i5-2300.

Here is the code:

library(Rcpp)
library(microbenchmark)
 sourceCpp(code='
  #include <Rcpp.h>

  // [[Rcpp::export]]
  int for_iteration(const int n) {
    int j = 0;
    for (int i = 0; i < n; i++) {
      j++;
    }
    return (j);
  }

  // [[Rcpp::export]]
  int for_double_iteration(const int n) {
    int j = 0, k = 0;
    for (int i = 0; i < n; i++) {
      for (k = 0; k < i; k++) {
        j++;
      }
    }
    return (j);
  }'
)

Here are the benchmarks:

> microbenchmark(for_iteration(10^5),
                 for_iteration(10^6),
                 for_iteration(10^7),
                 for_iteration(10^8),
                 for_iteration(10^9),
                 times=1000, unit="us")
Unit: microseconds
                expr   min    lq     mean median     uq      max neval
 for_iteration(10^5) 1.254 1.379 1.552229 1.4305 1.5255    9.197  1000
 for_iteration(10^6) 1.268 1.379 1.724993 1.4300 1.5410  123.822  1000
 for_iteration(10^7) 1.274 1.377 3.687182 1.4240 1.5075 2126.909  1000
 for_iteration(10^8) 1.253 1.387 1.546345 1.4360 1.5320    9.527  1000
 for_iteration(10^9) 1.265 1.386 1.568382 1.4300 1.5230   20.307  1000

> microbenchmark(for_double_iteration(10^2),
                for_double_iteration(10^3),
                for_double_iteration(10^4),
                for_double_iteration(10^5),
                for_double_iteration(10^6),
                times=1000, unit="us")
Unit: microseconds
                       expr     min       lq       mean   median       uq      max neval
 for_double_iteration(10^2)   0.921   1.0230   1.516304   1.1100   1.4335   23.308  1000
 for_double_iteration(10^3)   1.722   1.7970   2.491999   1.8915   2.2270   49.772  1000
 for_double_iteration(10^4)   9.022   9.1165   9.947209   9.2050   9.6925   55.841  1000
 for_double_iteration(10^5)  82.170  82.2700  86.240153  82.3590  82.9070 1959.903  1000
 for_double_iteration(10^6) 813.723 814.4450 834.870686 826.4625 828.2280 1178.062  1000
Answers
nr: #1 dodano: 2016-12-31 17:12

My guess is correct, it is optimized out and replaced by j = n.

Have a check on godbolt, with compiler flag -O2, you get assembly output (gas):

for_iteration(int):
        test    edi, edi
        mov     eax, 0
        cmovns  eax, edi
        ret

Oops... there is no loop at all!!

Source Show
◀ Wstecz