[aarch64-port-dev ] RFR: 8216350: AArch64: monitor unlock fast path not called

Nick Gasson (Arm Technology China) Nick.Gasson at arm.com
Wed Jan 9 02:50:53 UTC 2019


Hi Derek

> My only comments are to fix up some comments (pre-existing), and some trivial cleanups of pre-existing code. These are judgement calls, and it would be good to get the approval of at least one Andrew.

I agree all of these are good, especially #3 which obscures the symmetry 
between the lock and unlock functions. But I think we ought to create a 
separate patch, to separate code cleanup with no functional change from 
this patch which is a bug fix / functional change?

Also two minor things:

* There is inconsistent (four space) indentation under "// Check if it 
is still a light weight lock ..." in aarch64_enc_fast_unlock.

* At the end of aarch64_enc_fast_lock there is a commented out block "// 
PPC port checks the following invariants": I guess we should either 
implement these if we think they're useful or delete this whole block. 
(FWIW x86 doesn't do any extra checking #ifdef ASSERT).

Finally we could also consider moving these two functions into 
macroAssembler_aarch64.cpp to match the other ports.

Thanks,
Nick

On 09/01/2019 10:19, Derek White wrote:
> Hi Nick,
> 
> Very nice find!
> 
> My only comments are to fix up some comments (pre-existing), and some trivial cleanups of pre-existing code. These are judgement calls, and it would be good to get the approval of at least one Andrew.
> 
> Comments:
> 1) The TODO comment before aarch64_enc_fast_unlock() has been done since 2014, so it can go away.
> 
> 2) In aarch64_enc_fast_lock() and aarch64_enc_fast_unlock(), there are three comment blocks referring to old code using cmpxchgptr. At this point in time I find the new code clearer, and these comments don't add much?
> 
> Cleanup suggestions (untested!):
> 3) In aarch64_enc_fast_lock():
>      // we can use AArch64's bit test and branch here but
>      // markoopDesc does not define a bit index just the bit value
>      // so assert in case the bit pos changes
> #   define __monitor_value_log2 1
>      assert(markOopDesc::monitor_value == (1 << __monitor_value_log2), "incorrect bit position");
>      __ tbnz(disp_hdr, __monitor_value_log2, object_has_monitor);
> #   undef __monitor_value_log2
> 
> Can be replaced with:
>       __ tbnz(disp_hdr, exact_log2(markOopDesc::monitor_value), object_has_monitor);
> It looks like this was fixed in several places a long time ago, but this one got missed.
> 
> 4)  Slightly better comment for last instruction of fast_unlock (and explicitly use zr).
>      __ stlr(zr, tmp); // set unowned
> 
> - Derek
> 
> 
> --------------------- Patch on original code (not your patch, sorry!) -----------------------------
> --- src/hotspot/cpu/aarch64/aarch64.ad
> +++ src/hotspot/cpu/aarch64/aarch64.ad
> @@ -3418,13 +3418,7 @@
>       }
>   
>       // Handle existing monitor
> -    // we can use AArch64's bit test and branch here but
> -    // markoopDesc does not define a bit index just the bit value
> -    // so assert in case the bit pos changes
> -#   define __monitor_value_log2 1
> -    assert(markOopDesc::monitor_value == (1 << __monitor_value_log2), "incorrect bit position");
> -    __ tbnz(disp_hdr, __monitor_value_log2, object_has_monitor);
> -#   undef __monitor_value_log2
> +    __ tbnz(disp_hdr, exact_log2(markOopDesc::monitor_value), object_has_monitor);
>   
>       // Set displaced_header to be (markOop of object | UNLOCK_VALUE).
>       __ orr(disp_hdr, disp_hdr, markOopDesc::unlocked_value);
> @@ -3455,14 +3449,6 @@
>         __ b(retry_load);
>       }
>   
> -    // Formerly:
> -    // __ cmpxchgptr(/*oldv=*/disp_hdr,
> -    //               /*newv=*/box,
> -    //               /*addr=*/oop,
> -    //               /*tmp=*/tmp,
> -    //               cont,
> -    //               /*fail*/NULL);
> -
>       assert(oopDesc::mark_offset_in_bytes() == 0, "offset of _mark is not 0");
>   
>       // If the compare-and-exchange succeeded, then we found an unlocked
> @@ -3511,15 +3497,6 @@
>         __ bind(fail);
>       }
>   
> -    // Label next;
> -    // __ cmpxchgptr(/*oldv=*/disp_hdr,
> -    //               /*newv=*/rthread,
> -    //               /*addr=*/tmp,
> -    //               /*tmp=*/rscratch1,
> -    //               /*succeed*/next,
> -    //               /*fail*/NULL);
> -    // __ bind(next);
> -
>       // store a non-null value into the box.
>       __ str(box, Address(box, BasicLock::displaced_header_offset_in_bytes()));
>   
> @@ -3544,9 +3521,6 @@
>   
>     %}
>   
> -  // TODO
> -  // reimplement this with custom cmpxchgptr code
> -  // which avoids some of the unnecessary branching
>     enc_class aarch64_enc_fast_unlock(iRegP object, iRegP box, iRegP tmp, iRegP tmp2) %{
>       MacroAssembler _masm(&cbuf);
>       Register oop = as_Register($object$$reg);
> @@ -3597,12 +3571,6 @@
>           __ b(retry_load);
>         }
>   
> -    // __ cmpxchgptr(/*compare_value=*/box,
> -    //               /*exchange_value=*/disp_hdr,
> -    //               /*where=*/oop,
> -    //               /*result=*/tmp,
> -    //               cont,
> -    //               /*cas_failed*/NULL);
>       assert(oopDesc::mark_offset_in_bytes() == 0, "offset of _mark is not 0");
>   
>       __ bind(cas_failed);
> @@ -3626,7 +3594,7 @@
>       __ cbnz(rscratch1, cont);
>       // need a release store here
>       __ lea(tmp, Address(tmp, ObjectMonitor::owner_offset_in_bytes()));
> -    __ stlr(rscratch1, tmp); // rscratch1 is zero
> +    __ stlr(zr, tmp); // set unowned
>   
>       __ bind(cont);
>       // flag == EQ indicates success
> 
> 
>> -----Original Message-----
>> From: aarch64-port-dev <aarch64-port-dev-bounces at openjdk.java.net> On
>> Behalf Of Nick Gasson (Arm Technology China)
>> Sent: Tuesday, January 08, 2019 3:04 AM
>> To: hotspot-compiler-dev at openjdk.java.net compiler <hotspot-compiler-
>> dev at openjdk.java.net>
>> Cc: nd <nd at arm.com>; aarch64-port-dev at openjdk.java.net
>> Subject: [EXT] [aarch64-port-dev ] RFR: 8216350: AArch64: monitor unlock
>> fast path not called
>>
>> ----------------------------------------------------------------------
>> Hi,
>>
>> While looking at the profiling output of some micro-benchmarks for locking
>> on AArch64, I noticed that the monitor unlock fast-path in
>> aarch64_enc_fast_unlock in aarch64.ad (under label `object_has_monitor') is
>> almost never executed, even though the lock in the test is inflated.
>>
>> In order to branch to this fast-path we check if bit #1 is set in the displaced
>> header word on the stack:
>>
>>     __ tbnz(disp_hdr, exact_log2(markOopDesc::monitor_value),
>> object_has_monitor);
>>
>> But in the common case the value in the dhw is set by the monitor locking
>> fast-path in aarch64_enc_fast_lock, where we use the pointer to the dhw as
>> an arbitrary non-null value. But the lower three bits of this pointer will
>> always be zero, and so won't trigger the unlock fast-path which is looking for
>> bit #1 set, and we will fall through to call the runtime to unlock the monitor.
>>
>>     // store a non-null value into the box.
>>     __ str(box, Address(box, BasicLock::displaced_header_offset_in_bytes()));
>>
>> It seems that the unlock fast-path will only be executed when the monitor
>> was originally locked by the runtime (e.g. when the lock was first inflated),
>> because ObjectSynchronizer::slow_enter will store
>> markOopDesc::unused_mark into the dhw, and this value has bit #1 set.
>>
>> Can someone help me review this change to aarch64_enc_fast_lock to use
>> markOopDesc::unused_mark as the arbitrary non-null value rather than `box'
>> to match ObjectSynchronizer::slow_enter?
>>
>> Webrev: http://cr.openjdk.java.net/~njian/8216350/webrev.0/
>> Bug: https://bugs.openjdk.java.net/browse/JDK-8216350
>>
>> Also removed an unnecessary double branch in the unlock code.
>>
>> Ran jtreg + jcstress.
>>
>> I also added a new micro-benchmark to
>> test/micro/org/openjdk/bench/vm/lang/LockUnlock.java so you can see this
>> behaviour:
>>
>> Without patch:
>>
>> Result "org.openjdk.bench.vm.lang.LockUnlock.testContendedLock":
>>     597.855 ?(99.9%) 73.183 ns/op [Average]
>>     (min, avg, max) = (438.862, 597.855, 861.028), stdev = 97.697
>>     CI (99.9%): [524.672, 671.038] (assumes normal distribution)
>>
>> With patch:
>>
>> Result "org.openjdk.bench.vm.lang.LockUnlock.testContendedLock":
>>     219.067 ?(99.9%) 21.146 ns/op [Average]
>>     (min, avg, max) = (176.379, 219.067, 300.186), stdev = 28.229
>>     CI (99.9%): [197.921, 240.212] (assumes normal distribution)
>>
>> This is with -XX:+UseLSE, -UseLSE has a similar improvement.
>>
>> Thanks,
>> Nick


More information about the aarch64-port-dev mailing list