Question about CAS via LDREX/STREX on 32-bit arm

David Holmes david.holmes at oracle.com
Mon Mar 13 02:06:55 UTC 2023


Hi Thomas,

I'm far too rusty of the details to answer most of your questions but:

 > - Do we not need an explicit CLREX after the operation? Or does the
 > STREX also clear the hardware monitor? Or does it just not matter?

STREX clears the reservation  so CLREX is not needed. From Arm ARM:

"A Load-Exclusive instruction marks a small block of memory for 
exclusive access. The size of the marked block is IMPLEMENTATION 
DEFINED, see Marking and the size of the marked memory block on page 
B2-105. A Store-Exclusive instruction to any address in the marked block 
clears the marking."

 > - We have VM_Version::supports_ldrex(). Code seems to me sometimes
 > guarded by this (e.g MacroAssembler::atomic_cas_bool), sometimes code
 > just executes ldrex/strex (e.g. the one-shot path of
 > MacroAssembler::cas_for_lock_acquire). Am I mistaken? Or is LDREX now
 > generally available? Does ARMv6 mean STREX and LDREX are available?

I think you will find that where the ldrex guard is missing, the code is 
for C2 only and a C2 build is only possible if ldrex is available. (C2 
was only supported on ARMv7+.).

Also the ARM conditional instructions as used in atomic_cas_bool can 
cause confusion when trying to understand the logic. :)

Cheers,
David
-----


On 11/03/2023 8:18 pm, Thomas Stüfe wrote:
> Hi ARM experts,
> 
> I am trying to understand how CAS is implemented on arm; in particular, 
> "MacroAssembler::atomic_cas_bool":
> 
> MacroAssembler::atomic_cas_bool
> 
> ```
>      assert_different_registers(tmp_reg, oldval, newval, base);
>      Label loop;
>      bind(loop);
> A   ldrex(tmp_reg, Address(base, offset));
> B   subs(tmp_reg, tmp_reg, oldval);
> C   strex(tmp_reg, newval, Address(base, offset), eq);
> D   cmp(tmp_reg, 1, eq);
> E   b(loop, eq);
> F   cmp(tmp_reg, 0);
>      if (tmpreg == noreg) {
>        pop(tmp_reg);
>      }
> ```
> 
> It uses LDREX and STREX to perform a cas of *(base+offset) from oldval 
> to newval. It does so in a loop. The code distinguishes two failures: 
> STREX failing, and a "semantically failed" CAS.
> 
> Here is what I think this code does:
> 
> A) LDREX: tmp=*(base+offset)
> B) tmp -= oldvalue
>     If *(base+offset) was unchanged, tmp_reg is now 0 and Z is 1
> C) If Z is 1: STREX the new value: *(base+offset)=newval. Otherwise, omit.
>     After this, if the store succeeded, tmp_reg is 0, if the store 
> failed its 1.
> D) Here, tmp_reg is: 0 if the store succeeded, 1 if it failed, 1...n if 
> *(base+offset) had been modified before LDREX.
>     We now compare with 1 and ...
> E) ...repeat the loop if tmp_reg was 1
> 
> So we loop until either *(base+offset) had been changed to some other 
> value concurrently before out LDREX. Or until our store succeeded.
> 
> I wondered what the loop guards against. And why it would be okay 
> sometimes to omit it.
> 
> IIUC, STREX fails if the core did lose its exclusive access to the 
> memory location since the LDREX. This can be one of three things, right? :
> - another core slipped in an LDREX+STREX to the same location between 
> our LDREX and STREX
> - Or we context switched to another thread or process. I assume it does 
> a CLREX then, right? Because how could you prevent a sequence like 
> "LDREX(p1) -> switch -> LDREX(p2) -> switch back STREX(p1)" - if I 
> understand the ARM manual [1] correctly, a STREX to a different location 
> than the preceding LDREX is undefined.
> - Or we had a signal after LDREX and did a second LDREX in the signal 
> handler. Does the kernel do a CLREX when invoking a signal handler?
> 
> More questions:
> 
> - If I got it right, at (D), tmp_reg value "1" has two meanings: either 
> STREX failed or some thread increased the value concurrently by 1. We 
> repeat the loop either way. Is this just accepted behavior? Increasing 
> by 1 is maybe not that rare.
> 
> - If I understood this correctly, the loop guards us mainly against 
> context switches. Without the loop a context switch would count as a 
> "semantically failed" CAS. Why would that be okay? Should we not do this 
> loop always?
> 
> - Do we not need an explicit CLREX after the operation? Or does the 
> STREX also clear the hardware monitor? Or does it just not matter?
> 
> - We have VM_Version::supports_ldrex(). Code seems to me sometimes 
> guarded by this (e.g MacroAssembler::atomic_cas_bool), sometimes code 
> just executes ldrex/strex (e.g. the one-shot path of 
> MacroAssembler::cas_for_lock_acquire). Am I mistaken? Or is LDREX now 
> generally available? Does ARMv6 mean STREX and LDREX are available?
> 
> 
> Thanks a lot!
> 
> Cheers, Thomas
> 
> 
> 


More information about the aarch32-port-dev mailing list