RFR: 8232782: Shenandoah: streamline post-LRB CAS barrier (aarch64) (version 2)

Roman Kennke rkennke at redhat.com
Wed Jul 1 22:20:33 UTC 2020


Hi Kelvin,

I had something similar in mind with x86, but Aleksey stopped me :-) It
may be worth noting that we need a maximum of 3 consequtive CASes to
cover all cases, it is not necessary to make a loop:

1. Fast-path CAS#1
If that fails, it *may* be because of value-in-memory being a from-
space-ref. Check that and do:
2. CAS#2 with expected = previous from CAS#1
If that fails, it may be because a competing thread just wrote a to-
space-ref of our expected object (i.e. our *original* expected value),
try again:
3. CAS#3 with expected = previous from CAS#2 (== original expected)
at that point the CAS cannot fail because of false negative because of
to-space-invariant

Correct me if I am wrong here!

Ah I see you optimized for smaller code there:
+  // It is extremely rare we reach this point.  For this reason, the
+  // implementation opts for smaller rather than potentially faster
+  // code.  Ultimately, smaller code for this rare case most likely
+  // delivers higher overall throughput by enabling improved icache
+  // performance.
Good then.

Some comments on the patch:

// It is required that addr represent a
+// memory location at which a non-null reference value is stored and
that
+// expected holds a non-null reference value.

Is this true? I believe the memory location may hold NULL. The expected
value may hold NULL too, in which case we can skip the barrier. The
compiler optimizes the case where it *knows statically* that expected
== NULL to not emit the barrier. We also should have run-time checks
for these situations that skip to 'done' on first CAS-failure with NULL
for the case when compiler cannot determine statically NULL, or
alternatively don't use the _not_null() variants of encode/decode
methods.

+//  weak:    This relaxes the "strong" property so that CAS is allowed
+//           to fail even when the expected value is present in
memory.
+//           This is useful for load-with-lock, store-conditional
+//           loops where certain failures require retries.  If weak is
+//           enabled, it is ok to return failure rather than retrying.

If we are certain that the meaning of 'weak' allows us to skip the
barrier, we can change/remove our corresponding implementation of the
weak paths to emit a plain CAS. Right?

+  // Try to CAS with given arguments using LL/SC pair.  If successful,
+  // then we are done.

LL/SC pair is not (generally) true anymore with this updated patch.

Other than that, I like the impl.

I'd actually like to see a similar implementation in x86, I believe it
should be slightly more efficient. However, I'm not sure it's really
worth - I don't think I've ever seen this code path very hot anywhere.
We'd need to convince Aleksey though - this code has undergone a couple
of changes already and it's basically always caused unforeseen troubles
(weird register clashes, extremely rare corner cases like the above
mentioned 3rd CAS, etc)

Thank you!
Roman


On Tue, 2020-06-30 at 22:39 +0000, Nilsen, Kelvin wrote:
> Thank you for feedback from previously distributed draft patch.  This
> new patch is similar to the patch distributed on June 24.  However,
> this version uses MacroAssembler::cmpxchng() instead of hard-coding
> the use of ldxr/stxr instructions.
> 
> See http://cr.openjdk.java.net/~kdnilsen/JDK-8232782/webrev.02/
> 
> This patch addresses the problem described in 
> https://bugs.openjdk.java.net/browse/JDK-8232782
> 
> The implementation mimics the behavior of the recently revised x86
> implementation of cmpxchg_oop with slight refinements:
> 
> X86 version:
> Step 1: Try CAS
> Step 2: if CAS fails, check if original memory holds equivalent from-
> space pointer
> Step 3: Use CAS to overwrite memory with equivalent to-space pointer
> Step 4: Try CAS again
> Step 5: Return boolean result to indicate success or failure
> 
> AARCH64 version:
> Step 1: Try CAS
> Step 2: if CAS fails, check if original memory holds equivalent from-
> space pointer
> Step 3 (differs): Do not overwrite memory with equivalent to-space
> pointer,  Instead, run the original CAS request with from-space
> pointer as the "expected" value. If this succeeds, we're done.  If
> this fails, go back to step 1 and try that again.
> 
> Step 5: Return boolean result to indicate success or failure
> 
> This patch satisfies tier1, tier2, and hotspot_gc_shenandoah
> regression tests on Ubuntu 18.04.4 LTS (GNU/Linux 5.3.0-1023-aws
> aarch64).  I have also run an "extreme" garbage collection workload
> for 20 minutes without problem.
> 
> Is this ok to merge?
> 
> 
> 



More information about the shenandoah-dev mailing list