RFR: 8217368: AArch64: C2 recursive stack locking optimisation not triggered
Nick Gasson (Arm Technology China)
Nick.Gasson at arm.com
Fri Jan 18 08:40:25 UTC 2019
Hi,
While I was cleaning up the patch for 8216350 I noticed an issue in the
implementation of recursive locking in aarch64_enc_fast_lock:
Bug: https://bugs.openjdk.java.net/browse/JDK-8217368
Webrev: http://cr.openjdk.java.net/~ngasson/8217368/webrev.0/
First we load the markOop of the object we want to lock and OR it with
markOopDesc::unlocked_value (1). Then we do a CAS to exchange the
address of the box on our thread's stack with the object's header word
iff it's equal to the (markOop | 1) we just computed. If this fails,
then we should check for a recursive lock by comparing
(~(page size - 1) | 3) & (markOop - SP) == 0
Where "markOop" is the current object header word loaded by the failed
CAS. This checks that the lock bits are zero (locked) and the stack
address of the displaced header is within one page of the current SP.
But on AArch64 we actually do this:
(~(page size - 1) | 3) & ((old markOop | 1) - SP) == 0
Where "old markOop | 1" is the compare-to value used for the CAS. This
is always false as the result has at least bit #0 set. This only affects
C2, the C1_MacroAssembler version has the correct test.
The diff looks big but all it does is swap the usage of registers `tmp'
and `disp_hdr' in the first section so the markOop loaded by the CAS
ends up in disp_hdr and tmp holds the (markOop | 1) compare-to value.
Ran jtreg, plus jcstress with -XX:+UseLSE and -XX:-UseLSE. Also added
another microbenchmark to
micro/org/openjdk/bench/vm/lang/LockUnlock.java as I couldn't find an
existing JMH case that triggered this.
Without patch:
Result
"org.openjdk.bench.vm.lang.LockUnlock.testRecursiveSynchronizationNoBias":
510.781 ?(99.9%) 1.196 ns/op [Average]
(min, avg, max) = (508.769, 510.781, 513.854), stdev = 1.597
CI (99.9%): [509.585, 511.977] (assumes normal distribution)
With patch:
Result
"org.openjdk.bench.vm.lang.LockUnlock.testRecursiveSynchronizationNoBias":
197.038 ?(99.9%) 0.096 ns/op [Average]
(min, avg, max) = (196.886, 197.038, 197.296), stdev = 0.128
CI (99.9%): [196.942, 197.134] (assumes normal distribution)
Two other minor things:
* Does anyone know what the comment "// Load Compare Value application
register." means? It's present in the PPC and S390 ports too.
* The x86 port #ifdef LP64 uses "7 - os::vm_page_size()" as the mask in
the recursive lock test. I think the "7" here is
markOopDesc::biased_lock_mask and is presumably there to prevent a
silent mutual exclusion failure if a markOop with the bias locking bits
set ends up the fast_lock path (although this should never happen).
Should we change markOopDesc::lock_mask_in_place to
markOopDesc::biased_lock_mask_in_place in the AArch64 port too?
Thanks,
Nick
More information about the hotspot-compiler-dev
mailing list