RFR: 8216350: AArch64: monitor unlock fast path not called
Nick Gasson (Arm Technology China)
Nick.Gasson at arm.com
Tue Jan 8 08:03:43 UTC 2019
Hi,
While looking at the profiling output of some micro-benchmarks for
locking on AArch64, I noticed that the monitor unlock fast-path in
aarch64_enc_fast_unlock in aarch64.ad (under label `object_has_monitor')
is almost never executed, even though the lock in the test is inflated.
In order to branch to this fast-path we check if bit #1 is set in the
displaced header word on the stack:
__ tbnz(disp_hdr, exact_log2(markOopDesc::monitor_value),
object_has_monitor);
But in the common case the value in the dhw is set by the monitor
locking fast-path in aarch64_enc_fast_lock, where we use the pointer to
the dhw as an arbitrary non-null value. But the lower three bits of this
pointer will always be zero, and so won't trigger the unlock fast-path
which is looking for bit #1 set, and we will fall through to call the
runtime to unlock the monitor.
// store a non-null value into the box.
__ str(box, Address(box, BasicLock::displaced_header_offset_in_bytes()));
It seems that the unlock fast-path will only be executed when the
monitor was originally locked by the runtime (e.g. when the lock was
first inflated), because ObjectSynchronizer::slow_enter will store
markOopDesc::unused_mark into the dhw, and this value has bit #1 set.
Can someone help me review this change to aarch64_enc_fast_lock to use
markOopDesc::unused_mark as the arbitrary non-null value rather than
`box' to match ObjectSynchronizer::slow_enter?
Webrev: http://cr.openjdk.java.net/~njian/8216350/webrev.0/
Bug: https://bugs.openjdk.java.net/browse/JDK-8216350
Also removed an unnecessary double branch in the unlock code.
Ran jtreg + jcstress.
I also added a new micro-benchmark to
test/micro/org/openjdk/bench/vm/lang/LockUnlock.java so you can see this
behaviour:
Without patch:
Result "org.openjdk.bench.vm.lang.LockUnlock.testContendedLock":
597.855 ?(99.9%) 73.183 ns/op [Average]
(min, avg, max) = (438.862, 597.855, 861.028), stdev = 97.697
CI (99.9%): [524.672, 671.038] (assumes normal distribution)
With patch:
Result "org.openjdk.bench.vm.lang.LockUnlock.testContendedLock":
219.067 ?(99.9%) 21.146 ns/op [Average]
(min, avg, max) = (176.379, 219.067, 300.186), stdev = 28.229
CI (99.9%): [197.921, 240.212] (assumes normal distribution)
This is with -XX:+UseLSE, -UseLSE has a similar improvement.
Thanks,
Nick
More information about the hotspot-compiler-dev
mailing list