<div dir="ltr"><div dir="ltr">Hi David,<div><br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Mon, Mar 13, 2023 at 3:07 AM David Holmes <<a href="mailto:david.holmes@oracle.com">david.holmes@oracle.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Hi Thomas,<br>
<br>
I'm far too rusty of the details to answer most of your questions but:<br>
<br>
> - Do we not need an explicit CLREX after the operation? Or does the<br>
> STREX also clear the hardware monitor? Or does it just not matter?<br>
<br>
STREX clears the reservation so CLREX is not needed. From Arm ARM:<br>
<br>
"A Load-Exclusive instruction marks a small block of memory for <br>
exclusive access. The size of the marked block is IMPLEMENTATION <br>
DEFINED, see Marking and the size of the marked memory block on page <br>
B2-105. A Store-Exclusive instruction to any address in the marked block <br>
clears the marking."<br>
<br>
> - We have VM_Version::supports_ldrex(). Code seems to me sometimes<br>
> guarded by this (e.g MacroAssembler::atomic_cas_bool), sometimes code<br>
> just executes ldrex/strex (e.g. the one-shot path of<br>
> MacroAssembler::cas_for_lock_acquire). Am I mistaken? Or is LDREX now<br>
> generally available? Does ARMv6 mean STREX and LDREX are available?<br>
<br>
I think you will find that where the ldrex guard is missing, the code is <br>
for C2 only and a C2 build is only possible if ldrex is available. (C2 <br>
was only supported on ARMv7+.).<br>
<br>
Also the ARM conditional instructions as used in atomic_cas_bool can <br>
cause confusion when trying to understand the logic. :)<br>
<br></blockquote><div><br></div><div>Thank you, that already helps! Not that rusty, it seems :-)<br></div><div> </div><div>Cheers, Thomas</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
Cheers,<br>
David<br>
-----<br>
<br>
<br>
On 11/03/2023 8:18 pm, Thomas Stüfe wrote:<br>
> Hi ARM experts,<br>
> <br>
> I am trying to understand how CAS is implemented on arm; in particular, <br>
> "MacroAssembler::atomic_cas_bool":<br>
> <br>
> MacroAssembler::atomic_cas_bool<br>
> <br>
> ```<br>
> assert_different_registers(tmp_reg, oldval, newval, base);<br>
> Label loop;<br>
> bind(loop);<br>
> A ldrex(tmp_reg, Address(base, offset));<br>
> B subs(tmp_reg, tmp_reg, oldval);<br>
> C strex(tmp_reg, newval, Address(base, offset), eq);<br>
> D cmp(tmp_reg, 1, eq);<br>
> E b(loop, eq);<br>
> F cmp(tmp_reg, 0);<br>
> if (tmpreg == noreg) {<br>
> pop(tmp_reg);<br>
> }<br>
> ```<br>
> <br>
> It uses LDREX and STREX to perform a cas of *(base+offset) from oldval <br>
> to newval. It does so in a loop. The code distinguishes two failures: <br>
> STREX failing, and a "semantically failed" CAS.<br>
> <br>
> Here is what I think this code does:<br>
> <br>
> A) LDREX: tmp=*(base+offset)<br>
> B) tmp -= oldvalue<br>
> If *(base+offset) was unchanged, tmp_reg is now 0 and Z is 1<br>
> C) If Z is 1: STREX the new value: *(base+offset)=newval. Otherwise, omit.<br>
> After this, if the store succeeded, tmp_reg is 0, if the store <br>
> failed its 1.<br>
> D) Here, tmp_reg is: 0 if the store succeeded, 1 if it failed, 1...n if <br>
> *(base+offset) had been modified before LDREX.<br>
> We now compare with 1 and ...<br>
> E) ...repeat the loop if tmp_reg was 1<br>
> <br>
> So we loop until either *(base+offset) had been changed to some other <br>
> value concurrently before out LDREX. Or until our store succeeded.<br>
> <br>
> I wondered what the loop guards against. And why it would be okay <br>
> sometimes to omit it.<br>
> <br>
> IIUC, STREX fails if the core did lose its exclusive access to the <br>
> memory location since the LDREX. This can be one of three things, right? :<br>
> - another core slipped in an LDREX+STREX to the same location between <br>
> our LDREX and STREX<br>
> - Or we context switched to another thread or process. I assume it does <br>
> a CLREX then, right? Because how could you prevent a sequence like <br>
> "LDREX(p1) -> switch -> LDREX(p2) -> switch back STREX(p1)" - if I <br>
> understand the ARM manual [1] correctly, a STREX to a different location <br>
> than the preceding LDREX is undefined.<br>
> - Or we had a signal after LDREX and did a second LDREX in the signal <br>
> handler. Does the kernel do a CLREX when invoking a signal handler?<br>
> <br>
> More questions:<br>
> <br>
> - If I got it right, at (D), tmp_reg value "1" has two meanings: either <br>
> STREX failed or some thread increased the value concurrently by 1. We <br>
> repeat the loop either way. Is this just accepted behavior? Increasing <br>
> by 1 is maybe not that rare.<br>
> <br>
> - If I understood this correctly, the loop guards us mainly against <br>
> context switches. Without the loop a context switch would count as a <br>
> "semantically failed" CAS. Why would that be okay? Should we not do this <br>
> loop always?<br>
> <br>
> - Do we not need an explicit CLREX after the operation? Or does the <br>
> STREX also clear the hardware monitor? Or does it just not matter?<br>
> <br>
> - We have VM_Version::supports_ldrex(). Code seems to me sometimes <br>
> guarded by this (e.g MacroAssembler::atomic_cas_bool), sometimes code <br>
> just executes ldrex/strex (e.g. the one-shot path of <br>
> MacroAssembler::cas_for_lock_acquire). Am I mistaken? Or is LDREX now <br>
> generally available? Does ARMv6 mean STREX and LDREX are available?<br>
> <br>
> <br>
> Thanks a lot!<br>
> <br>
> Cheers, Thomas<br>
> <br>
> <br>
> <br>
</blockquote></div></div>