Question: Should the loop limit be evaluated or stored?

Question

Should the loop limit be evaluated or stored?

Answers 7
Added at 2016-12-29 09:12
Tags
Question

In C++, is it faster to store the limit of a loop in a variable than evaluating the value?

For example:

Is it a slower approach to use

for(int i=0; i<n*n+2*n; ++i)
{ .... }

than doing the following?

for(int i=0, limit=n*n+2*n; i<limit; ++i)
{ .... }

For clarity, assume that n is some given variable that remains unchanged during the course of the loop.

Answers to

Should the loop limit be evaluated or stored?

nr: #1 dodano: 2016-12-29 09:12

It doesn't matter outside of a hot spot. I mean - yes, calculating the value only once will be faster in debug with no compiler optimizations, and at least as fast in release. But it usually doesn't matter. Do it the way that is easier to write, to read and to maintain. Let me quote Donald Knuth's famous words:

The real problem is that programmers have spent far too much time worrying about efficiency in the wrong places and at the wrong times; premature optimization is the root of all evil (or at least most of it) in programming.

Having said that, I prefer this way since recently:

for (int i = 0, upperBound = n*n + 2*n /*or n*(n + 2)*/; i < upperBound; ++i)
{
}

This way the scope of upperBound is limited to the for statement itself and doesn't bleed into the outer scope where it's not needed.

nr: #2 dodano: 2016-12-29 09:12

Yes, the first one is slower than the second one. Because it'll have to calculate the limit in every iteration in the first case whereas in the second case it calculates the limit once at start and then uses it for all the iterations.

nr: #3 dodano: 2016-12-29 09:12

If n is a globally declared non-volatile variable, then the behaviour of

for (int i = 0; i < n * n + 2 * n; ++i)

is unspecified. A compiler is allowed to optimise n * n + 2 * n to be evaluated once even if another thread modifies n. Furthermore, if another thread is able to modify n, then you should take steps to avoid the potential for simultaneous read and writes of n (the behaviour of which is undefined). Consider using std::atomic<int> as the type for n.

So really it's appropriate to introduce limit anyway if you want the stopping condition to be dependent on the value of n observable when program control reaches the for loop, irrespective of any performance considerations. Consider therefore

for (int i = 0, limit = n * n + 2 * n; i < limit; ++i)

which has the advantage that the scope of limit doesn't leak into the surrounding statements.

But if you are able to, you could always run the loop backwards:

for (int i = n * n + 2 * n - 1; i >= 0; --i)

Be very careful though with unsigned types if you adopt that idea.

nr: #4 dodano: 2016-12-29 09:12

Before speed you have to consider correctness. Suppose your upper limit depends on the size of an array and you change that array (add, remove element) inside the loop. This is to say that it's more "robust" write:

for (int i=0; i < n*n+2*n; i++)

because you re-evaluate the invariant of the loop. I prefer to profiling if i feel that there is some performance issue.

nr: #5 dodano: 2016-12-29 09:12

The answer depends on the ability of the compiler to optimize the computation of the limit, i.e. to perform the same optimization that you were suggesting in code (in general, a compiler will try to perform such trivial optimizations).

If the compiler cannot assert whether the limit changes during loop execution or whether its computation has global side effects, then it cannot perform this optimization. In such a case, if you know (at compile time) that in fact there are no side effects and the limit is not changed, then pre-computing the limit is a sensible optimization. Examples:

// file foo.cc

extern int non_local_int;               // access can be optimized
extern volatile int volatile_int;       // access must not be optimized
extern int bar(int);                    // may have global side effects
extern void take_addr(int&);            // may store address
namespace {
  int addr_never_taken_int=10;          // never operand of address-of operator
  int addr_taken_int=10;                // used as operand of address-of op.
}

void foo(int n)
{
  for(int i=0; i<n*n+n+n+1; ++i)        // can be optimized 
  { ... }

  int local_int = bar(n);
  for(int i=0; i<n*local_int; ++i)      // can be optimized
  { ... }

  for(int i=0; i<n*non_local_int; ++i)  // can be optimized, but is not threadsafe
  { ... no calls to outside code }

  for(int i=0; i<n*bar(n); ++i)         // cannot be optimized
  { ... }

  for(int i=0; i<addr_never_taken_int; ++i)  // can be optimized
  { ... }

  take_addr(addr_taken_int);
  for(int i=0; i<addr_taken_int; ++i)   // cannot be optimized
  { ... code that calls *any* outside function }

  for(int i=0; i<n*volatile_int; ++i)   // must not be optimized
  { ... }

  for(int i=0; i<n; ++i)                // can be optimized
  { ... code that calls *any* outside function }

  take_addr(n);
  for(int i=0; i<n; ++i)                // cannot be optimized
  { ... code that calls *any* outside function }
}

Edited to reflect comments made by supercat. Note that volatile objects are suitable for communication within a single thread of execution (e.g. with a signal handler), but not with another thread. Threadsafety is the responsibility of the programmer.

nr: #6 dodano: 2016-12-29 11:12

In situations like this I always count down to zero or another constant unless the order of operations matters, e.g.

for (int i = n*n + 2*n - 1; i >= 0; --i)
{ ... }

For me this makes it easier to see how long the loop will take and is less likely to have any 'off by one' errors. In other words the behaviour is fully defined right at the start of loop without worrying if n is going to be changed or i is going to break out of an array.

nr: #7 dodano: 2016-12-29 12:12

I have tried this using simple C# code in Visual Studio using the debugger:

for(int i=0, limit=n*n+2*n; i<limit; ++i)

limitis assigned only once and in for(int i=0; i<n*n+2*n; ++i) then the limit is calculated for each iteration. That's why if n>0 then the second case, where limit is assigned, is pretty much fast as per my observation.

Source Show
◀ Wstecz