RFR: 8216350: AArch64: monitor unlock fast path not called

Nick Gasson (Arm Technology China) Nick.Gasson at arm.com
Tue Jan 8 08:03:43 UTC 2019


Hi,

While looking at the profiling output of some micro-benchmarks for 
locking on AArch64, I noticed that the monitor unlock fast-path in 
aarch64_enc_fast_unlock in aarch64.ad (under label `object_has_monitor') 
is almost never executed, even though the lock in the test is inflated.

In order to branch to this fast-path we check if bit #1 is set in the 
displaced header word on the stack:

   __ tbnz(disp_hdr, exact_log2(markOopDesc::monitor_value), 
object_has_monitor);

But in the common case the value in the dhw is set by the monitor 
locking fast-path in aarch64_enc_fast_lock, where we use the pointer to 
the dhw as an arbitrary non-null value. But the lower three bits of this 
pointer will always be zero, and so won't trigger the unlock fast-path 
which is looking for bit #1 set, and we will fall through to call the 
runtime to unlock the monitor.

   // store a non-null value into the box.
   __ str(box, Address(box, BasicLock::displaced_header_offset_in_bytes()));

It seems that the unlock fast-path will only be executed when the 
monitor was originally locked by the runtime (e.g. when the lock was 
first inflated), because ObjectSynchronizer::slow_enter will store 
markOopDesc::unused_mark into the dhw, and this value has bit #1 set.

Can someone help me review this change to aarch64_enc_fast_lock to use 
markOopDesc::unused_mark as the arbitrary non-null value rather than 
`box' to match ObjectSynchronizer::slow_enter?

Webrev: http://cr.openjdk.java.net/~njian/8216350/webrev.0/
Bug: https://bugs.openjdk.java.net/browse/JDK-8216350

Also removed an unnecessary double branch in the unlock code.

Ran jtreg + jcstress.

I also added a new micro-benchmark to 
test/micro/org/openjdk/bench/vm/lang/LockUnlock.java so you can see this 
behaviour:

Without patch:

Result "org.openjdk.bench.vm.lang.LockUnlock.testContendedLock":
   597.855 ?(99.9%) 73.183 ns/op [Average]
   (min, avg, max) = (438.862, 597.855, 861.028), stdev = 97.697
   CI (99.9%): [524.672, 671.038] (assumes normal distribution)

With patch:

Result "org.openjdk.bench.vm.lang.LockUnlock.testContendedLock":
   219.067 ?(99.9%) 21.146 ns/op [Average]
   (min, avg, max) = (176.379, 219.067, 300.186), stdev = 28.229
   CI (99.9%): [197.921, 240.212] (assumes normal distribution)

This is with -XX:+UseLSE, -UseLSE has a similar improvement.

Thanks,
Nick


More information about the hotspot-compiler-dev mailing list