RFR: 8217368: AArch64: C2 recursive stack locking optimisation not triggered

Nick Gasson (Arm Technology China) Nick.Gasson at arm.com
Fri Jan 18 08:40:25 UTC 2019


Hi,

While I was cleaning up the patch for 8216350 I noticed an issue in the 
implementation of recursive locking in aarch64_enc_fast_lock:

Bug: https://bugs.openjdk.java.net/browse/JDK-8217368
Webrev: http://cr.openjdk.java.net/~ngasson/8217368/webrev.0/

First we load the markOop of the object we want to lock and OR it with 
markOopDesc::unlocked_value (1). Then we do a CAS to exchange the 
address of the box on our thread's stack with the object's header word 
iff it's equal to the (markOop | 1) we just computed. If this fails, 
then we should check for a recursive lock by comparing

   (~(page size - 1) | 3) & (markOop - SP) == 0

Where "markOop" is the current object header word loaded by the failed 
CAS. This checks that the lock bits are zero (locked) and the stack 
address of the displaced header is within one page of the current SP. 
But on AArch64 we actually do this:

   (~(page size - 1) | 3) & ((old markOop | 1) - SP) == 0

Where "old markOop | 1" is the compare-to value used for the CAS. This 
is always false as the result has at least bit #0 set. This only affects 
C2, the C1_MacroAssembler version has the correct test.

The diff looks big but all it does is swap the usage of registers `tmp' 
and `disp_hdr' in the first section so the markOop loaded by the CAS 
ends up in disp_hdr and tmp holds the (markOop | 1) compare-to value.

Ran jtreg, plus jcstress with -XX:+UseLSE and -XX:-UseLSE. Also added 
another microbenchmark to 
micro/org/openjdk/bench/vm/lang/LockUnlock.java as I couldn't find an 
existing JMH case that triggered this.

Without patch:

Result 
"org.openjdk.bench.vm.lang.LockUnlock.testRecursiveSynchronizationNoBias":
  510.781 ?(99.9%) 1.196 ns/op [Average]
  (min, avg, max) = (508.769, 510.781, 513.854), stdev = 1.597
  CI (99.9%): [509.585, 511.977] (assumes normal distribution)

With patch:

Result 
"org.openjdk.bench.vm.lang.LockUnlock.testRecursiveSynchronizationNoBias": 

  197.038 ?(99.9%) 0.096 ns/op [Average]
  (min, avg, max) = (196.886, 197.038, 197.296), stdev = 0.128
  CI (99.9%): [196.942, 197.134] (assumes normal distribution)

Two other minor things:

* Does anyone know what the comment "// Load Compare Value application 
register." means? It's present in the PPC and S390 ports too.

* The x86 port #ifdef LP64 uses "7 - os::vm_page_size()" as the mask in 
the recursive lock test. I think the "7" here is 
markOopDesc::biased_lock_mask and is presumably there to prevent a 
silent mutual exclusion failure if a markOop with the bias locking bits 
set ends up the fast_lock path (although this should never happen). 
Should we change markOopDesc::lock_mask_in_place to 
markOopDesc::biased_lock_mask_in_place in the AArch64 port too?

Thanks,
Nick


More information about the hotspot-compiler-dev mailing list