RFR: 8232782: Shenandoah: streamline post-LRB CAS barrier (aarch64) (version 2)
Roman Kennke
rkennke at redhat.com
Wed Jul 1 22:55:27 UTC 2020
Ah now I remember why we want to do the CAS barrier even for weak-CAS:
while it is technically true that calling code must be prepared to deal
with occasional spurious failures, we probably don't want it to get
stuck in retry-loop until GC gets to update the field - which may be a
fairly long period of time. Not sure if we can draw any other benefit
from 'weak' here?
Another *possible* optimization (while we're at it) is to check if GC
is actually active (HAS_FORWARDED, see LRB impl) and avoid the whole
decoding of failure-witness business. This may be something to keep in
mind should we ever find out that this code affects us outside of GC
cycles.
Roman
On Thu, 2020-07-02 at 00:20 +0200, Roman Kennke wrote:
> Hi Kelvin,
>
> I had something similar in mind with x86, but Aleksey stopped me :-)
> It
> may be worth noting that we need a maximum of 3 consequtive CASes to
> cover all cases, it is not necessary to make a loop:
>
> 1. Fast-path CAS#1
> If that fails, it *may* be because of value-in-memory being a from-
> space-ref. Check that and do:
> 2. CAS#2 with expected = previous from CAS#1
> If that fails, it may be because a competing thread just wrote a to-
> space-ref of our expected object (i.e. our *original* expected
> value),
> try again:
> 3. CAS#3 with expected = previous from CAS#2 (== original expected)
> at that point the CAS cannot fail because of false negative because
> of
> to-space-invariant
>
> Correct me if I am wrong here!
>
> Ah I see you optimized for smaller code there:
> + // It is extremely rare we reach this point. For this reason, the
> + // implementation opts for smaller rather than potentially faster
> + // code. Ultimately, smaller code for this rare case most likely
> + // delivers higher overall throughput by enabling improved icache
> + // performance.
> Good then.
>
> Some comments on the patch:
>
> // It is required that addr represent a
> +// memory location at which a non-null reference value is stored and
> that
> +// expected holds a non-null reference value.
>
> Is this true? I believe the memory location may hold NULL. The
> expected
> value may hold NULL too, in which case we can skip the barrier. The
> compiler optimizes the case where it *knows statically* that expected
> == NULL to not emit the barrier. We also should have run-time checks
> for these situations that skip to 'done' on first CAS-failure with
> NULL
> for the case when compiler cannot determine statically NULL, or
> alternatively don't use the _not_null() variants of encode/decode
> methods.
>
> +// weak: This relaxes the "strong" property so that CAS is
> allowed
> +// to fail even when the expected value is present in
> memory.
> +// This is useful for load-with-lock, store-conditional
> +// loops where certain failures require retries. If weak
> is
> +// enabled, it is ok to return failure rather than
> retrying.
>
> If we are certain that the meaning of 'weak' allows us to skip the
> barrier, we can change/remove our corresponding implementation of the
> weak paths to emit a plain CAS. Right?
>
> + // Try to CAS with given arguments using LL/SC pair. If
> successful,
> + // then we are done.
>
> LL/SC pair is not (generally) true anymore with this updated patch.
>
> Other than that, I like the impl.
>
> I'd actually like to see a similar implementation in x86, I believe
> it
> should be slightly more efficient. However, I'm not sure it's really
> worth - I don't think I've ever seen this code path very hot
> anywhere.
> We'd need to convince Aleksey though - this code has undergone a
> couple
> of changes already and it's basically always caused unforeseen
> troubles
> (weird register clashes, extremely rare corner cases like the above
> mentioned 3rd CAS, etc)
>
> Thank you!
> Roman
>
>
> On Tue, 2020-06-30 at 22:39 +0000, Nilsen, Kelvin wrote:
> > Thank you for feedback from previously distributed draft
> > patch. This
> > new patch is similar to the patch distributed on June 24. However,
> > this version uses MacroAssembler::cmpxchng() instead of hard-coding
> > the use of ldxr/stxr instructions.
> >
> > See http://cr.openjdk.java.net/~kdnilsen/JDK-8232782/webrev.02/
> >
> > This patch addresses the problem described in
> > https://bugs.openjdk.java.net/browse/JDK-8232782
> >
> > The implementation mimics the behavior of the recently revised x86
> > implementation of cmpxchg_oop with slight refinements:
> >
> > X86 version:
> > Step 1: Try CAS
> > Step 2: if CAS fails, check if original memory holds equivalent
> > from-
> > space pointer
> > Step 3: Use CAS to overwrite memory with equivalent to-space
> > pointer
> > Step 4: Try CAS again
> > Step 5: Return boolean result to indicate success or failure
> >
> > AARCH64 version:
> > Step 1: Try CAS
> > Step 2: if CAS fails, check if original memory holds equivalent
> > from-
> > space pointer
> > Step 3 (differs): Do not overwrite memory with equivalent to-space
> > pointer, Instead, run the original CAS request with from-space
> > pointer as the "expected" value. If this succeeds, we're done. If
> > this fails, go back to step 1 and try that again.
> >
> > Step 5: Return boolean result to indicate success or failure
> >
> > This patch satisfies tier1, tier2, and hotspot_gc_shenandoah
> > regression tests on Ubuntu 18.04.4 LTS (GNU/Linux 5.3.0-1023-aws
> > aarch64). I have also run an "extreme" garbage collection workload
> > for 20 minutes without problem.
> >
> > Is this ok to merge?
> >
> >
> >
More information about the shenandoah-dev
mailing list