Question about CAS via LDREX/STREX on 32-bit arm

Thomas Stüfe thomas.stuefe at gmail.com
Mon Mar 13 05:35:23 UTC 2023


Hi David,


On Mon, Mar 13, 2023 at 3:07 AM David Holmes <david.holmes at oracle.com>
wrote:

> Hi Thomas,
>
> I'm far too rusty of the details to answer most of your questions but:
>
>  > - Do we not need an explicit CLREX after the operation? Or does the
>  > STREX also clear the hardware monitor? Or does it just not matter?
>
> STREX clears the reservation  so CLREX is not needed. From Arm ARM:
>
> "A Load-Exclusive instruction marks a small block of memory for
> exclusive access. The size of the marked block is IMPLEMENTATION
> DEFINED, see Marking and the size of the marked memory block on page
> B2-105. A Store-Exclusive instruction to any address in the marked block
> clears the marking."
>
>  > - We have VM_Version::supports_ldrex(). Code seems to me sometimes
>  > guarded by this (e.g MacroAssembler::atomic_cas_bool), sometimes code
>  > just executes ldrex/strex (e.g. the one-shot path of
>  > MacroAssembler::cas_for_lock_acquire). Am I mistaken? Or is LDREX now
>  > generally available? Does ARMv6 mean STREX and LDREX are available?
>
> I think you will find that where the ldrex guard is missing, the code is
> for C2 only and a C2 build is only possible if ldrex is available. (C2
> was only supported on ARMv7+.).
>
> Also the ARM conditional instructions as used in atomic_cas_bool can
> cause confusion when trying to understand the logic. :)
>
>
Thank you, that already helps! Not that rusty, it seems :-)

Cheers, Thomas

Cheers,
> David
> -----
>
>
> On 11/03/2023 8:18 pm, Thomas Stüfe wrote:
> > Hi ARM experts,
> >
> > I am trying to understand how CAS is implemented on arm; in particular,
> > "MacroAssembler::atomic_cas_bool":
> >
> > MacroAssembler::atomic_cas_bool
> >
> > ```
> >      assert_different_registers(tmp_reg, oldval, newval, base);
> >      Label loop;
> >      bind(loop);
> > A   ldrex(tmp_reg, Address(base, offset));
> > B   subs(tmp_reg, tmp_reg, oldval);
> > C   strex(tmp_reg, newval, Address(base, offset), eq);
> > D   cmp(tmp_reg, 1, eq);
> > E   b(loop, eq);
> > F   cmp(tmp_reg, 0);
> >      if (tmpreg == noreg) {
> >        pop(tmp_reg);
> >      }
> > ```
> >
> > It uses LDREX and STREX to perform a cas of *(base+offset) from oldval
> > to newval. It does so in a loop. The code distinguishes two failures:
> > STREX failing, and a "semantically failed" CAS.
> >
> > Here is what I think this code does:
> >
> > A) LDREX: tmp=*(base+offset)
> > B) tmp -= oldvalue
> >     If *(base+offset) was unchanged, tmp_reg is now 0 and Z is 1
> > C) If Z is 1: STREX the new value: *(base+offset)=newval. Otherwise,
> omit.
> >     After this, if the store succeeded, tmp_reg is 0, if the store
> > failed its 1.
> > D) Here, tmp_reg is: 0 if the store succeeded, 1 if it failed, 1...n if
> > *(base+offset) had been modified before LDREX.
> >     We now compare with 1 and ...
> > E) ...repeat the loop if tmp_reg was 1
> >
> > So we loop until either *(base+offset) had been changed to some other
> > value concurrently before out LDREX. Or until our store succeeded.
> >
> > I wondered what the loop guards against. And why it would be okay
> > sometimes to omit it.
> >
> > IIUC, STREX fails if the core did lose its exclusive access to the
> > memory location since the LDREX. This can be one of three things, right?
> :
> > - another core slipped in an LDREX+STREX to the same location between
> > our LDREX and STREX
> > - Or we context switched to another thread or process. I assume it does
> > a CLREX then, right? Because how could you prevent a sequence like
> > "LDREX(p1) -> switch -> LDREX(p2) -> switch back STREX(p1)" - if I
> > understand the ARM manual [1] correctly, a STREX to a different location
> > than the preceding LDREX is undefined.
> > - Or we had a signal after LDREX and did a second LDREX in the signal
> > handler. Does the kernel do a CLREX when invoking a signal handler?
> >
> > More questions:
> >
> > - If I got it right, at (D), tmp_reg value "1" has two meanings: either
> > STREX failed or some thread increased the value concurrently by 1. We
> > repeat the loop either way. Is this just accepted behavior? Increasing
> > by 1 is maybe not that rare.
> >
> > - If I understood this correctly, the loop guards us mainly against
> > context switches. Without the loop a context switch would count as a
> > "semantically failed" CAS. Why would that be okay? Should we not do this
> > loop always?
> >
> > - Do we not need an explicit CLREX after the operation? Or does the
> > STREX also clear the hardware monitor? Or does it just not matter?
> >
> > - We have VM_Version::supports_ldrex(). Code seems to me sometimes
> > guarded by this (e.g MacroAssembler::atomic_cas_bool), sometimes code
> > just executes ldrex/strex (e.g. the one-shot path of
> > MacroAssembler::cas_for_lock_acquire). Am I mistaken? Or is LDREX now
> > generally available? Does ARMv6 mean STREX and LDREX are available?
> >
> >
> > Thanks a lot!
> >
> > Cheers, Thomas
> >
> >
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/porters-dev/attachments/20230313/b724dbf7/attachment-0001.htm>


More information about the porters-dev mailing list