Question: java-9 weakCompareAndSwap vs compareAndSwap

Question

java-9 weakCompareAndSwap vs compareAndSwap

Answers 2
Added at 2016-12-29 15:12
Tags
Question

This question is not about the difference between them - I know what spurious failure is and why it happens on LL/SC. My question is if I'm on intel x86 and using java-9 (build 149), why is there a difference between their assembly code?

public class WeakVsNonWeak {

    static jdk.internal.misc.Unsafe UNSAFE = jdk.internal.misc.Unsafe.getUnsafe();

    public static void main(String[] args) throws NoSuchFieldException, SecurityException {

        Holder h = new Holder();
        h.setValue(33);
        Class<?> holderClass = Holder.class;
        long valueOffset = UNSAFE.objectFieldOffset(holderClass.getDeclaredField("value"));

        int result = 0;
        for (int i = 0; i < 30_000; ++i) {
            result = strong(h, valueOffset);
        }
        System.out.println(result);

    }

    private static int strong(Holder h, long offset) {
        int sum = 0;
        for (int i = 33; i < 11_000; ++i) {
            boolean result = UNSAFE.weakCompareAndSwapInt(h, offset, i, i + 1);
            if (!result) {
                sum++;
            }
        }
        return sum;

    }

    public static class Holder {

        private int value;

        public int getValue() {
            return value;
        }

        public void setValue(int value) {
            this.value = value;
        }
    }
}

Running with:

 java -XX:-TieredCompilation 
      -XX:CICompilerCount=1 
      -XX:+UnlockDiagnosticVMOptions  
      -XX:+PrintIntrinsics 
      -XX:+PrintAssembly 
      --add-opens java.base/jdk.internal.misc=ALL-UNNAMED
      WeakVsNonWeak

Output of compareAndSwapInt(relevant parts):

     0x0000000109f0f4b8: movabs $0x111927c18,%rsi  ;   {metadata({method} {0x0000000111927c18} 'compareAndSwapInt' '(Ljava/lang/Object;JII)Z' in 'jdk/internal/misc/Unsafe')}
  0x0000000109f0f4c2: mov    %r15,%rdi
  0x0000000109f0f4c5: test   $0xf,%esp
  0x0000000109f0f4cb: je     0x0000000109f0f4e3
  0x0000000109f0f4d1: sub    $0x8,%rsp
  0x0000000109f0f4d5: callq  0x00000001098569d2  ;   {runtime_call SharedRuntime::dtrace_method_entry(JavaThread*, Method*)}
  0x0000000109f0f4da: add    $0x8,%rsp
  0x0000000109f0f4de: jmpq   0x0000000109f0f4e8
  0x0000000109f0f4e3: callq  0x00000001098569d2  ;   {runtime_call SharedRuntime::dtrace_method_entry(JavaThread*, Method*)}
  0x0000000109f0f4e8: pop    %r9
  0x0000000109f0f4ea: pop    %r8
  0x0000000109f0f4ec: pop    %rcx
  0x0000000109f0f4ed: pop    %rdx
  0x0000000109f0f4ee: pop    %rsi
  0x0000000109f0f4ef: lea    0x210(%r15),%rdi
  0x0000000109f0f4f6: movl   $0x4,0x288(%r15)
  0x0000000109f0f501: callq  0x00000001098fee40  ;   {runtime_call Unsafe_CompareAndSwapInt(JNIEnv_*, _jobject*, _jobject*, long, int, int)}
  0x0000000109f0f506: vzeroupper 
  0x0000000109f0f509: and    $0xff,%eax
  0x0000000109f0f50f: setne  %al
  0x0000000109f0f512: movl   $0x5,0x288(%r15)
  0x0000000109f0f51d: lock addl $0x0,-0x40(%rsp)
  0x0000000109f0f523: cmpl   $0x0,-0x3f04dd(%rip)        # 0x0000000109b1f050

Output of weakCompareAndSwapInt:

  0x000000010b698840: sub    $0x18,%rsp
  0x0000010b698847: mov    %rbp,0x10(%rsp)
  0x000000010b69884c: mov    %r8d,%eax
  0x000000010b69884f: lock cmpxchg %r9d,(%rdx,%rcx,1)
  0x000000010b698855: sete   %r11b
  0x000000010b698859: movzbl %r11b,%r11d        ;*invokevirtual compareAndSwapInt {reexecute=0 rethrow=0 return_oop=0}
                                                ; - jdk.internal.misc.Unsafe::weakCompareAndSwapInt@7 (line 1369)

I am by far not versatile enough to understand the entire output, but can definitely see the difference between lock addl and lock cmpxchg.

EDIT Peter's answer got me thinking. Let's see if compareAndSwap will be an intrinsic call:

-XX:+PrintIntrinsics -XX:-PrintAssembly

 @ 7   jdk.internal.misc.Unsafe::compareAndSwapInt (0 bytes)   (intrinsic)
 @ 20      jdk.internal.misc.Unsafe::weakCompareAndSwapInt (11 bytes)   (intrinsic).

And then run the example twice with/without:

-XX:DisableIntrinsic=_compareAndSwapInt

This is sort of weird, the output is exactly the same (same exact instructions) with the only differences that with enable intrinsic I get calls like this:

  0x000000010c23e355: callq  0x00000001016569d2  ;   {runtime_call SharedRuntime::dtrace_method_entry(JavaThread*, Method*)}
  0x000000010c23e381: callq  0x00000001016fee40  ;   {runtime_call Unsafe_CompareAndSwapInt(JNIEnv_*, _jobject*, _jobject*, long, int, int)}

And disabled:

  0x00000001109322d5: callq  0x0000000105c569d2  ;   {runtime_call _ZN13SharedRuntime19dtrace_method_entryEP10JavaThreadP6Method}
    0x00000001109322e3: callq  0x0000000105c569d2  ;   {runtime_call _ZN13SharedRuntime19dtrace_method_entryEP10JavaThreadP6Method}

This is rather intriguing, shouldn't the intrinsic code be different?

EDIT-2 the8472 makes sense too.

lock addl is a substitute for mfence that flushes the StoreBuffer on x86 as far as I know and it has to do with visibility and not atomicity indeed. Right before this entry, is:

 0x00000001133db6f6: movl   $0x4,0x288(%r15)
 0x00000001133db701: callq  0x00000001060fee40  ;   {runtime_call Unsafe_CompareAndSwapInt(JNIEnv_*, _jobject*, _jobject*, long, int, int)}
 0x00000001133db706: vzeroupper 
 0x00000001133db709: and    $0xff,%eax
 0x00000001133db70f: setne  %al
 0x00000001133db712: movl   $0x5,0x288(%r15)
 0x00000001133db71d: lock addl $0x0,-0x40(%rsp)
 0x00000001133db723: cmpl   $0x0,-0xd0bc6dd(%rip)        #     0x000000010631f050
                                            ;   {external_word}

If you look here is will delegate to another native call to Atomic:: cmpxchg that seems to be doing the swap atomically.

Why that is not a substitute to a direct lock cmpxchg is a mystery to me.

Answers
nr: #1 dodano: 2016-12-29 15:12

In the first case, a native method is being used. Either the code hasn't been optimised or it's not an intrinsic.

In the second case an intrinsic has been used to inline the assembly required, rather than call a JNI method. I would have though both cases would do this but I guess not.

nr: #2 dodano: 2016-12-29 16:12

I believe the lock addl is not the atomic op itself but a store-load barrier implementation. the atomic happens in the callq.

Since you're already logging with PrintIntrinsics you should check if it actually gets intrinsified.

Source Show
◀ Wstecz