Question about CAS via LDREX/STREX on 32-bit arm
Thomas Stüfe
thomas.stuefe at gmail.com
Mon Mar 13 10:07:31 UTC 2023
This whole coding lives in MacroAssembler::atomic_cas_bool, but that
function is not always called. There are plenty of direct usages of
ldrex+strex. There is even a condition argument for
MacroAssembler::cas_for_lock_acquire to either do strex directly or to dive
down into atomic_cas_bool.
On Mon, Mar 13, 2023 at 11:03 AM Reingruber, Richard <
richard.reingruber at sap.com> wrote:
> > Yes, I read it the same way. So we repeat the CAS for lost reservation.
> I'm
>
> > interested in when this could happen and why it would be okay to
> sometimes
>
> > omit this loop and do the "raw" LDREX-STREX sequence. See my original
> mail.
>
>
>
> Hm, I don't understand. The loop in the sequence A-F is always there. How
> is it omitted?
>
>
>
>
>
> *From: *Thomas Stüfe <thomas.stuefe at gmail.com>
> *Date: *Monday, 13. March 2023 at 10:54
> *To: *Reingruber, Richard <richard.reingruber at sap.com>
> *Cc: *porters-dev at openjdk.org <porters-dev at openjdk.org>,
> aarch32-port-dev at openjdk.org <aarch32-port-dev at openjdk.org>
> *Subject: *Re: Question about CAS via LDREX/STREX on 32-bit arm
>
> Hi Richard :)
>
>
>
>
>
> On Mon, Mar 13, 2023 at 10:02 AM Reingruber, Richard <
> richard.reingruber at sap.com> wrote:
>
> > Hi ARM experts,
>
>
>
> Hi Thomas, not at all an ARM expert... :)
>
> but I think I understand the code.
>
>
>
> > I am trying to understand how CAS is implemented on arm; in particular,
> "MacroAssembler::atomic_cas_bool":
>
>
>
> > MacroAssembler::atomic_cas_bool
>
>
>
> > ```
>
> > assert_different_registers(tmp_reg, oldval, newval, base);
>
> > Label loop;
>
> > bind(loop);
>
> > A ldrex(tmp_reg, Address(base, offset));
>
> > B subs(tmp_reg, tmp_reg, oldval);
>
> > C strex(tmp_reg, newval, Address(base, offset), eq);
>
> > D cmp(tmp_reg, 1, eq);
>
> > E b(loop, eq);
>
> > F cmp(tmp_reg, 0);
>
> > if (tmpreg == noreg) {
>
> > pop(tmp_reg);
>
> > }
>
> > ```
>
>
>
> > It uses LDREX and STREX to perform a cas of *(base+offset) from oldval
> to newval. It does so in a loop. The code distinguishes two failures: STREX
> failing, and a "semantically failed" CAS.
>
>
>
> > Here is what I think this code does:
>
>
>
> > A) LDREX: tmp=*(base+offset)
>
> > B) tmp -= oldvalue
>
> > If *(base+offset) was unchanged, tmp_reg is now 0 and Z is 1
>
> > C) If Z is 1: STREX the new value: *(base+offset)=newval. Otherwise,
> omit.
>
> > After this, if the store succeeded, tmp_reg is 0, if the store failed
> its 1.
>
> > D) Here, tmp_reg is: 0 if the store succeeded, 1 if it failed, 1...n if
> *(base+offset) had been modified before LDREX.
>
> > We now compare with 1 and ...
>
> > E) ...repeat the loop if tmp_reg was 1
>
>
>
> > So we loop until either *(base+offset) had been changed to some other
> value concurrently before out LDREX. Or until our store succeeded.
>
>
>
> > I wondered what the loop guards against. And why it would be okay
> sometimes to omit it.
>
>
>
> The loop is needed to try again if the reservation was lost until the
> STREX succeeds or *(base+offset) != oldvalue.
>
>
>
> So there are two cases. The loop is left iff
>
>
>
> (1) *(base+offset) != oldvalue
>
> (2) the STREX succeeded
>
>
>
> First it is important to understand that C, D, E are only executed if at
> B the eq-condition is set to true.
>
> This is based on the "Conditional Execution" feature of ARM: execution of
> most instructions can be made dependent on a condition (see
> https://developer.arm.com/documentation/den0013/d/ARM-Thumb-Unified-Assembly-Language-Instructions/Instruction-set-basics/Conditional-execution?lang=en
> <https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdeveloper.arm.com%2Fdocumentation%2Fden0013%2Fd%2FARM-Thumb-Unified-Assembly-Language-Instructions%2FInstruction-set-basics%2FConditional-execution%3Flang%3Den&data=05%7C01%7Crichard.reingruber%40sap.com%7C5e217cf5139c4ca453b608db23a8ef6e%7C42f7676cf455423c82f6dc2d99791af7%7C0%7C0%7C638142980698821044%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=gBV1sR3DFdVAklgjVRdAdb7oIEv8Zwj%2FasRfwEB10vQ%3D&reserved=0>
> )
>
>
>
> So in case (1) C, D, E are not executed because B indicates that
> *(base+offset) and oldvalue are not-eq and the loop is left.
>
>
>
>
>
> Ah, thanks, that was my thinking error. I did not realize that CMP was
> also conditional. I assumed the "eq" in the CMP (D) was a condition for the
> CMP. Which makes no sense, as I know now, since CMP just does a sub and
> only needs two arguments. So that meant the full branch CDE was controlled
> from the subtraction result at B.
>
>
>
> That also resolves the "1 has a double meaning" question. It hasn't.
>
>
>
> In case (2) C, D, E are executed. At C, if the reservation from A still
> exists, tmp_reg will be set to 0 otherwise to 1. At E the branch is taken
> if D indicated tmp_reg == 1 (reservation was lost) otherwise the loop is
> left.
>
>
>
> Yes, I read it the same way. So we repeat the CAS for lost reservation.
> I'm interested in when this could happen and why it would be okay to
> sometimes omit this loop and do the "raw" LDREX-STREX sequence. See my
> original mail.
>
>
>
> I suspect it has something to do with context switches. That the kernel
> does a CLREX when we switch, so if we switch between LDREX and STREX, the
> reservation could be lost. But why would it then be okay to ignore this
> sometimes?
>
>
>
> Thanks!
>
>
>
> Thomas
>
>
>
>
>
> Cheers, Richard.
>
>
>
> *From: *porters-dev <porters-dev-retn at openjdk.org> on behalf of Thomas
> Stüfe <thomas.stuefe at gmail.com>
> *Date: *Saturday, 11. March 2023 at 11:19
> *To: *porters-dev at openjdk.org <porters-dev at openjdk.org>,
> aarch32-port-dev at openjdk.org <aarch32-port-dev at openjdk.org>
> *Subject: *Question about CAS via LDREX/STREX on 32-bit arm
>
> Hi ARM experts,
>
> I am trying to understand how CAS is implemented on arm; in particular,
> "MacroAssembler::atomic_cas_bool":
>
> MacroAssembler::atomic_cas_bool
>
>
>
> ```
>
> assert_different_registers(tmp_reg, oldval, newval, base);
> Label loop;
> bind(loop);
> A ldrex(tmp_reg, Address(base, offset));
> B subs(tmp_reg, tmp_reg, oldval);
> C strex(tmp_reg, newval, Address(base, offset), eq);
> D cmp(tmp_reg, 1, eq);
> E b(loop, eq);
> F cmp(tmp_reg, 0);
> if (tmpreg == noreg) {
> pop(tmp_reg);
> }
> ```
>
> It uses LDREX and STREX to perform a cas of *(base+offset) from oldval to
> newval. It does so in a loop. The code distinguishes two failures: STREX
> failing, and a "semantically failed" CAS.
>
> Here is what I think this code does:
>
>
>
> A) LDREX: tmp=*(base+offset)
> B) tmp -= oldvalue
> If *(base+offset) was unchanged, tmp_reg is now 0 and Z is 1
> C) If Z is 1: STREX the new value: *(base+offset)=newval. Otherwise, omit.
> After this, if the store succeeded, tmp_reg is 0, if the store failed
> its 1.
> D) Here, tmp_reg is: 0 if the store succeeded, 1 if it failed, 1...n if
> *(base+offset) had been modified before LDREX.
> We now compare with 1 and ...
> E) ...repeat the loop if tmp_reg was 1
>
> So we loop until either *(base+offset) had been changed to some other
> value concurrently before out LDREX. Or until our store succeeded.
>
> I wondered what the loop guards against. And why it would be okay
> sometimes to omit it.
>
> IIUC, STREX fails if the core did lose its exclusive access to the memory
> location since the LDREX. This can be one of three things, right? :
> - another core slipped in an LDREX+STREX to the same location between our
> LDREX and STREX
> - Or we context switched to another thread or process. I assume it does a
> CLREX then, right? Because how could you prevent a sequence like "LDREX(p1)
> -> switch -> LDREX(p2) -> switch back STREX(p1)" - if I understand the ARM
> manual [1] correctly, a STREX to a different location than the preceding
> LDREX is undefined.
> - Or we had a signal after LDREX and did a second LDREX in the signal
> handler. Does the kernel do a CLREX when invoking a signal handler?
>
> More questions:
>
> - If I got it right, at (D), tmp_reg value "1" has two meanings: either
> STREX failed or some thread increased the value concurrently by 1. We
> repeat the loop either way. Is this just accepted behavior? Increasing by 1
> is maybe not that rare.
>
> - If I understood this correctly, the loop guards us mainly against
> context switches. Without the loop a context switch would count as a
> "semantically failed" CAS. Why would that be okay? Should we not do this
> loop always?
>
> - Do we not need an explicit CLREX after the operation? Or does the STREX
> also clear the hardware monitor? Or does it just not matter?
>
> - We have VM_Version::supports_ldrex(). Code seems to me sometimes guarded
> by this (e.g MacroAssembler::atomic_cas_bool), sometimes code just executes
> ldrex/strex (e.g. the one-shot path of
> MacroAssembler::cas_for_lock_acquire). Am I mistaken? Or is LDREX now
> generally available? Does ARMv6 mean STREX and LDREX are available?
>
>
> Thanks a lot!
>
> Cheers, Thomas
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/porters-dev/attachments/20230313/b4ed4721/attachment-0001.htm>
More information about the porters-dev
mailing list