RFR: 8232782: Shenandoah: streamline post-LRB CAS barrier (aarch64) (version 2)

Thu Jul 2 00:35:21 UTC 2020

Thanks for the careful review.  I'll make another round with it.

I wasn’t entirely sure what the preconditions for this function were so I documented my best guess.  Good thing you reviewed my comments carefully because apparently I guessed wrong.

Regarding use of the weak Boolean argument, it was not clear to me why a caller would want to specify weak, but I assumed if they did request it, I should go ahead and honor it.  Are you suggesting I should ignore the value of the weak argument and always behave as if the caller had requested not weak?

Also, wil you check my understanding further?   I believe this service (cmpxchg_oop) is used primarily to resolve races between background GC threads and mutator threads.  There is no need to need for the JVM to resolve races between multiple mutator threads.  That is a programmer responsibility. The reason for cmpxchg_oop is to resolve the following two race conditions:

  1. A mutator thread is overwriting a field that the GC may be overwriting in parallel. The mutator needs to use this protocol to overwrite pointers in order to prevent the following race:
        i. GC fetches from-space pointer at addr
       ii. GC computes equivalent to-space pointer
      iii. Mutator overwrites pointer at addr with new_val
      iv. GC overwrites pointer at addr with to-space replacement of original from-space value, incorrectly clobbering the mutation!
     The GC thread uses cmpxchg_oop to abandon its overwrite attempt in case the value has changed since it began its effort.

  2. A mutator thread fetches a pointer from memory and the LRB discovers that the pointer refers to from-space memory.  After determining the to-space pointer that represents the new location of the object originally referenced by the fetched from-space pointer, the mutator attempts to heal the value held in memory by overwriting that memory with the to-space equivalent.  The healing effort must use cmpxchg_oop to resolve the exact same race condition that might occur when a background GC thread is healing a from-space pointer residing in memory.

In both of the above cases, we know that GC is active and both expected and new_val are not equal to null.  We also know for both of these cases, by the way, that expected refers to from-space, so maybe our implementation of cmpxchg_oop should really be starting with step 2!

Can you help me understand the other situations in which cmpxchg_oop will be called, under which expected and new_val may equal null?

Thanks.

On 7/1/20, 3:56 PM, "Roman Kennke" <rkennke at redhat.com> wrote:

    CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.

    Ah now I remember why we want to do the CAS barrier even for weak-CAS:
    while it is technically true that calling code must be prepared to deal
    with occasional spurious failures, we probably don't want it to get
    stuck in retry-loop until GC gets to update the field - which may be a
    fairly long period of time. Not sure if we can draw any other benefit
    from 'weak' here?

    Another *possible* optimization (while we're at it) is to check if GC
    is actually active (HAS_FORWARDED, see LRB impl) and avoid the whole
    decoding of failure-witness business. This may be something to keep in
    mind should we ever find out that this code affects us outside of GC
    cycles.

    Roman

    On Thu, 2020-07-02 at 00:20 +0200, Roman Kennke wrote:
    > Hi Kelvin,
    >
    > I had something similar in mind with x86, but Aleksey stopped me :-)
    > It
    > may be worth noting that we need a maximum of 3 consequtive CASes to
    > cover all cases, it is not necessary to make a loop:
    >
    > 1. Fast-path CAS#1
    > If that fails, it *may* be because of value-in-memory being a from-
    > space-ref. Check that and do:
    > 2. CAS#2 with expected = previous from CAS#1
    > If that fails, it may be because a competing thread just wrote a to-
    > space-ref of our expected object (i.e. our *original* expected
    > value),
    > try again:
    > 3. CAS#3 with expected = previous from CAS#2 (== original expected)
    > at that point the CAS cannot fail because of false negative because
    > of
    > to-space-invariant
    >
    > Correct me if I am wrong here!
    >
    > Ah I see you optimized for smaller code there:
    > +  // It is extremely rare we reach this point.  For this reason, the
    > +  // implementation opts for smaller rather than potentially faster
    > +  // code.  Ultimately, smaller code for this rare case most likely
    > +  // delivers higher overall throughput by enabling improved icache
    > +  // performance.
    > Good then.
    >
    > Some comments on the patch:
    >
    > // It is required that addr represent a
    > +// memory location at which a non-null reference value is stored and
    > that
    > +// expected holds a non-null reference value.
    >
    > Is this true? I believe the memory location may hold NULL. The
    > expected
    > value may hold NULL too, in which case we can skip the barrier. The
    > compiler optimizes the case where it *knows statically* that expected
    > == NULL to not emit the barrier. We also should have run-time checks
    > for these situations that skip to 'done' on first CAS-failure with
    > NULL
    > for the case when compiler cannot determine statically NULL, or
    > alternatively don't use the _not_null() variants of encode/decode
    > methods.
    >
    > +//  weak:    This relaxes the "strong" property so that CAS is
    > allowed
    > +//           to fail even when the expected value is present in
    > memory.
    > +//           This is useful for load-with-lock, store-conditional
    > +//           loops where certain failures require retries.  If weak
    > is
    > +//           enabled, it is ok to return failure rather than
    > retrying.
    >
    > If we are certain that the meaning of 'weak' allows us to skip the
    > barrier, we can change/remove our corresponding implementation of the
    > weak paths to emit a plain CAS. Right?
    >
    > +  // Try to CAS with given arguments using LL/SC pair.  If
    > successful,
    > +  // then we are done.
    >
    > LL/SC pair is not (generally) true anymore with this updated patch.
    >
    > Other than that, I like the impl.
    >
    > I'd actually like to see a similar implementation in x86, I believe
    > it
    > should be slightly more efficient. However, I'm not sure it's really
    > worth - I don't think I've ever seen this code path very hot
    > anywhere.
    > We'd need to convince Aleksey though - this code has undergone a
    > couple
    > of changes already and it's basically always caused unforeseen
    > troubles
    > (weird register clashes, extremely rare corner cases like the above
    > mentioned 3rd CAS, etc)
    >
    > Thank you!
    > Roman
    >
    >
    > On Tue, 2020-06-30 at 22:39 +0000, Nilsen, Kelvin wrote:
    > > Thank you for feedback from previously distributed draft
    > > patch.  This
    > > new patch is similar to the patch distributed on June 24.  However,
    > > this version uses MacroAssembler::cmpxchng() instead of hard-coding
    > > the use of ldxr/stxr instructions.
    > >
    > > See http://cr.openjdk.java.net/~kdnilsen/JDK-8232782/webrev.02/
    > >
    > > This patch addresses the problem described in
    > > https://bugs.openjdk.java.net/browse/JDK-8232782
    > >
    > > The implementation mimics the behavior of the recently revised x86
    > > implementation of cmpxchg_oop with slight refinements:
    > >
    > > X86 version:
    > > Step 1: Try CAS
    > > Step 2: if CAS fails, check if original memory holds equivalent
    > > from-
    > > space pointer
    > > Step 3: Use CAS to overwrite memory with equivalent to-space
    > > pointer
    > > Step 4: Try CAS again
    > > Step 5: Return boolean result to indicate success or failure
    > >
    > > AARCH64 version:
    > > Step 1: Try CAS
    > > Step 2: if CAS fails, check if original memory holds equivalent
    > > from-
    > > space pointer
    > > Step 3 (differs): Do not overwrite memory with equivalent to-space
    > > pointer,  Instead, run the original CAS request with from-space
    > > pointer as the "expected" value. If this succeeds, we're done.  If
    > > this fails, go back to step 1 and try that again.
    > >
    > > Step 5: Return boolean result to indicate success or failure
    > >
    > > This patch satisfies tier1, tier2, and hotspot_gc_shenandoah
    > > regression tests on Ubuntu 18.04.4 LTS (GNU/Linux 5.3.0-1023-aws
    > > aarch64).  I have also run an "extreme" garbage collection workload
    > > for 20 minutes without problem.
    > >
    > > Is this ok to merge?
    > >
    > >
    > >