Question: Java program runs slower when code that is never executed is commented out

Question

Java program runs slower when code that is never executed is commented out

Answers 3
Added at 2016-12-31 19:12
Tags
Question

I observed some strange behaviour in one of my Java programs. I have tried to strip the code down as much as possible while still being able to replicate the behaviour. Code in full below.

public class StrangeBehaviour {

    static boolean recursionFlag = true;

    public static void main(String[] args) {
        long startTime = System.nanoTime();
        for (int i = 0; i < 10000; i ++) {
            functionA(6, 0);
        }
        long endTime = System.nanoTime();
        System.out.format("%.2f seconds elapsed.\n", (endTime - startTime) / 1000.0 / 1000 / 1000);
    }

    static boolean functionA(int recursionDepth, int recursionSwitch) {
        if (recursionDepth == 0) { return true; }
        return functionB(recursionDepth, recursionSwitch);
    }

    static boolean functionB(int recursionDepth, int recursionSwitch) {
        for (int i = 0; i < 16; i++) {
            if (StrangeBehaviour.recursionFlag) {
                if (recursionSwitch == 0) {
                    if (functionA(recursionDepth - 1, 1 - recursionSwitch)) return true;
                } else {
                    if (!functionA(recursionDepth - 1, 1 - recursionSwitch)) return false;
                }
            } else {
                // This block is never entered into.
                // Yet commenting out one of the lines below makes the program run slower!
                System.out.println("...");
                System.out.println("...");
                System.out.println("...");
                System.out.println("...");
                System.out.println("...");
                System.out.println("...");
                System.out.println("...");
                System.out.println("...");
                System.out.println("...");
                System.out.println("...");
                System.out.println("...");
                System.out.println("...");
                System.out.println("...");
                System.out.println("...");
                System.out.println("...");
                System.out.println("...");
                System.out.println("...");
                System.out.println("...");
                System.out.println("...");
                System.out.println("...");
                System.out.println("...");
                System.out.println("...");
                System.out.println("...");
                System.out.println("...");
                System.out.println("...");
                System.out.println("...");
                System.out.println("...");
                System.out.println("...");
                System.out.println("...");
                System.out.println("...");
                System.out.println("...");
                System.out.println("...");
                System.out.println("...");
                System.out.println("...");
            }
        }
        return false;
    }
}

I have two functions, functionA() and functionB() that call each other recursively. Both functions take a recursionDepth parameter that controls the termination of the recursion. functionA() calls functionB() a maximum of once with recursionDepth unchanged. functionB() calls functionA() 16 times with recursionDepth - 1. The recursion terminates when functionA() is called with a recursionDepth of 0.

functionB() has a code block with a number of System.out.println() calls. This block is never entered into, as entry is controlled by a boolean recursionFlag variable that is set to true and never changed during program execution. However, commenting out even one of the println() calls causes the program to run slower. On my machine, execution time is <0.2s with all the println() calls present, and >2s when one of the calls is commented out.

What could be causing this behaviour? My only guess is that there is some naive compiler optimisation that is being triggered by a parameter related to the length of the code block (or the number of function calls etc.). Any further insight into this will be much appreciated!

Edit: I'm using JDK 1.8.

Answers
nr: #1 dodano: 2016-12-31 20:12

The commented code affects how inlining is handled. If functionB gets longer/bigger (more bytecode instructions) it will not be inlined into functionA.

So @J3D1 was able to use VMOptions to manually switch off inlining for functionB(): -XX:CompileCommand=dontinline,com.jd.benchmarking.StrangeBeh‌​aviour::functionB This appears to eliminate the delay with the shorter function.

with the vm options you can display inlining -XX:+UnlockDiagnosticVMOptions -XX:+PrintInlining

The bigger Version, wont inline functionB

@ 8   StrangeBehaviour::functionB (326 bytes)   callee is too large
@ 21   StrangeBehaviour::functionA (12 bytes)
  @ 8   StrangeBehaviour::functionB (326 bytes)   callee is too large
@ 35   StrangeBehaviour::functionA (12 bytes)
  @ 8   StrangeBehaviour::functionB (326 bytes)   callee is too large

The shorter Version will try to inline functionB, causing a few of further attempts.

@ 8   StrangeBehaviour::functionB (318 bytes)   inline (hot)
 @ 21   StrangeBehaviour::functionA (12 bytes)   inline (hot)
   @ 8   StrangeBehaviour::functionB (318 bytes)   inline (hot)
     @ 35   StrangeBehaviour::functionA (12 bytes)   recursive inlining is too deep
 @ 35   StrangeBehaviour::functionA (12 bytes)   inline (hot)
   @ 8   StrangeBehaviour::functionB (318 bytes)   inline (hot)
     @ 21   StrangeBehaviour::functionA (12 bytes)   recursive inlining is too deep
     @ 35   StrangeBehaviour::functionA (12 bytes)   recursive inlining is too deep
@ 21   StrangeBehaviour::functionA (12 bytes)   inline (hot)
 @ 8   StrangeBehaviour::functionB (318 bytes)   inline (hot)
   @ 35   StrangeBehaviour::functionA (12 bytes)   inline (hot)
     @ 8   StrangeBehaviour::functionB (318 bytes)   recursive inlining is too deep
@ 35   StrangeBehaviour::functionA (12 bytes)   inline (hot)
 @ 8   StrangeBehaviour::functionB (318 bytes)   inline (hot)
   @ 21   StrangeBehaviour::functionA (12 bytes)   inline (hot)
    @ 8   StrangeBehaviour::functionB (318 bytes)   recursive inlining is too deep
   @ 35   StrangeBehaviour::functionA (12 bytes)   inline (hot)
     @ 8   StrangeBehaviour::functionB (318 bytes)   recursive inlining is too deep

Mostly guessing, but the bigger/inlined bytecode will cause problems with branch prediction and caching

nr: #2 dodano: 2017-01-01 06:01

I am with @k5_, it seems that there exists a threshold to determine whether to inline a function. And if JIT compiler decided to inline it, it will cause a lot of work and time to do it as -XX:+PrintCompilation shows:

  task-id
    158   32       3       so_test.StrangeBehaviour::functionB (326 bytes)   made not entrant
    159   35       3       java.lang.String::<init> (82 bytes)
    160   36  s    1       java.util.Vector::size (5 bytes)
    1878   37 %     3       so_test.StrangeBehaviour::main @ 6 (65 bytes)
    1898   38       3       so_test.StrangeBehaviour::main (65 bytes)
    2665   39       3       java.util.regex.Pattern::has (15 bytes)
    2667   40       3       sun.misc.FDBigInteger::mult (64 bytes)
    2668   41       3       sun.misc.FDBigInteger::<init> (30 bytes)
    2668   42       3       sun.misc.FDBigInteger::trimLeadingZeros (57 bytes)
    2.51 seconds elapsed.

The upper is info without comment, the following is with a comment which reduce the method size from 326 byte to 318 byte. And you can notice the task id in the column 1 of output, is so much larger in latter, which causes more time.

  task-id
    126   35       4       so_test.StrangeBehaviour::functionA (12 bytes)
    130   33       3       so_test.StrangeBehaviour::functionA (12 bytes)   made not entrant
    131   36  s    1       java.util.Vector::size (5 bytes)
    14078   37 %     3       so_test.StrangeBehaviour::main @ 6 (65 bytes)
    14296   38       3       so_test.StrangeBehaviour::main (65 bytes)
    14296   39 %     4       so_test.StrangeBehaviour::functionB @ 2 (318 bytes)
    14300   40       4       so_test.StrangeBehaviour::functionB (318 bytes)
    14304   34       3       so_test.StrangeBehaviour::functionB (318 bytes)   made not entrant
    14628   41       3       java.util.regex.Pattern::has (15 bytes)
    14631   42       3       sun.misc.FDBigInteger::mult (64 bytes)
    14632   43       3       sun.misc.FDBigInteger::<init> (30 bytes)
    14632   44       3       sun.misc.FDBigInteger::trimLeadingZeros (57 bytes)
    14.50 seconds elapsed.

And if you change the code into following (add two line and commnet out a print line), you can see that the code size change into 326 byte, and runs faster now:

        if (StrangeBehaviour.recursionFlag) {
            int a = 1;
            int b = 1;
            if (recursionSwitch == 0) {
                if (functionA(recursionDepth - 1, 1 - recursionSwitch)) return true;
            } else {
                if (!functionA(recursionDepth - 1, 1 - recursionSwitch)) return false;
            }
        } else {
            // This block is never entered into.
            // Yet commenting out one of the lines below makes the program run slower!
            System.out.println("...");
            System.out.println("...");
            System.out.println("...");
          //System.out.println("...");
            System.out.println("...");
            System.out.println("...");
            System.out.println("...");
            System.out.println("...");
            System.out.println("...");
            System.out.println("...");
            System.out.println("...");
            System.out.println("...");
            System.out.println("...");
            System.out.println("...");
            System.out.println("...");
            System.out.println("...");
            System.out.println("...");
            System.out.println("...");
            System.out.println("...");
            System.out.println("...");
            System.out.println("...");
            System.out.println("...");
            System.out.println("...");
            System.out.println("...");
            System.out.println("...");
            System.out.println("...");
            System.out.println("...");
            System.out.println("...");
            System.out.println("...");
            System.out.println("...");
            System.out.println("...");
            System.out.println("...");
            System.out.println("...");
            System.out.println("...");
        }

New time and JIT compiler info:

    140   34       3       so_test.StrangeBehaviour::functionB (326 bytes)   made not entrant
    145   36       3       java.lang.String::<init> (82 bytes)
    148   37  s    1       java.util.Vector::size (5 bytes)
    162   38       4       so_test.StrangeBehaviour::functionA (12 bytes)
    163   33       3       so_test.StrangeBehaviour::functionA (12 bytes)   made not entrant
    1916   39 %     3       so_test.StrangeBehaviour::main @ 6 (65 bytes)
    1936   40       3       so_test.StrangeBehaviour::main (65 bytes)
    2686   41       3       java.util.regex.Pattern::has (15 bytes)
    2689   42       3       sun.misc.FDBigInteger::mult (64 bytes)
    2690   43       3       sun.misc.FDBigInteger::<init> (30 bytes)
    2690   44       3       sun.misc.FDBigInteger::trimLeadingZeros (57 bytes)
    2.55 seconds elapsed.

In conclusion:

  • When method size exceeds some limits, JIT will not inline this function;
  • And if we comment out a line, which reduce to a size below the threshold, JIT decide to inline it;
  • Inlining that function causes a lot of JIT tasks which slows down the program.

update:

Accroding to my latest trial, the answer for this question is not so easy:

As my code sample shows, a normal inline optimization will

  • accelerates program
  • and not costs much compiler work (in my test, it even cost less work when inline happens).

But in this problem, the code causes much JIT work and slows down program which seems a bug of JIT. And it is still not clear why it causes so much work of JIT.

nr: #3 dodano: 2017-01-02 07:01

The complete answer is a combination of k5_ and Tony's answers.

The code that the OP posted omits a warmup loop to trigger HotSpot compilation before doing the benchmark; hence the 10-fold (on my computer) speedup when the print statements are included, combines both the time spent in HotSpot to compile the bytecode to CPU instructions, as well as the actual running of the CPU instructions.

If I add a separate warmup loop before the timing loop, there is only a 2.5-fold speedup with the print statement.

That indicates that both the HotSpot/JIT compilation takes longer when the method is inlined (as Tony explained) as well as that the running of the code takes longer, probably because of worse cache or branch-prediction/pipelining performance, as k5_ showed.

public static void main(String[] args) {
    // Added the following warmup loop before the timing loop
    for (int i = 0; i < 50000; i++) {
        functionA(6, 0);
    }

    long startTime = System.nanoTime();
    for (int i = 0; i < 50000; i++) {
        functionA(6, 0);
    }
    long endTime = System.nanoTime();
    System.out.format("%.2f seconds elapsed.\n", (endTime - startTime) / 1000.0 / 1000 / 1000);
}
Source Show
◀ Wstecz