<html xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<meta name="Generator" content="Microsoft Word 15 (filtered medium)">
<style><!--
/* Font Definitions */
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0cm;
font-size:10.0pt;
font-family:"Calibri",sans-serif;}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:#0563C1;
text-decoration:underline;}
.MsoChpDefault
{mso-style-type:export-only;
font-size:10.0pt;}
@page WordSection1
{size:612.0pt 792.0pt;
margin:72.0pt 72.0pt 72.0pt 72.0pt;}
div.WordSection1
{page:WordSection1;}
--></style>
</head>
<body lang="en-DE" link="#0563C1" vlink="#954F72" style="word-wrap:break-word">
<div class="WordSection1">
<p class="MsoNormal"><span style="font-size:11.0pt;mso-fareast-language:EN-US">> Hi ARM experts,<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;mso-fareast-language:EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;mso-fareast-language:EN-US">Hi Thomas, not at all an ARM expert... :)<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;mso-fareast-language:EN-US">but I think I understand the code.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;mso-fareast-language:EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;mso-fareast-language:EN-US">> I am trying to understand how CAS is implemented on arm; in particular, "MacroAssembler::atomic_cas_bool":<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;mso-fareast-language:EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;mso-fareast-language:EN-US">> MacroAssembler::atomic_cas_bool<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;mso-fareast-language:EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;mso-fareast-language:EN-US">> ```<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;mso-fareast-language:EN-US">> assert_different_registers(tmp_reg, oldval, newval, base);<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;mso-fareast-language:EN-US">> Label loop;<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;mso-fareast-language:EN-US">> bind(loop);<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;mso-fareast-language:EN-US">> A ldrex(tmp_reg, Address(base, offset));<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;mso-fareast-language:EN-US">> B subs(tmp_reg, tmp_reg, oldval);<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;mso-fareast-language:EN-US">> C strex(tmp_reg, newval, Address(base, offset), eq);<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;mso-fareast-language:EN-US">> D cmp(tmp_reg, 1, eq);<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;mso-fareast-language:EN-US">> E b(loop, eq);<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;mso-fareast-language:EN-US">> F cmp(tmp_reg, 0);<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;mso-fareast-language:EN-US">> if (tmpreg == noreg) {<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;mso-fareast-language:EN-US">> pop(tmp_reg);<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;mso-fareast-language:EN-US">> }<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;mso-fareast-language:EN-US">> ```<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;mso-fareast-language:EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;mso-fareast-language:EN-US">> It uses LDREX and STREX to perform a cas of *(base+offset) from oldval to newval. It does so in a loop. The code distinguishes two failures: STREX failing, and a "semantically
failed" CAS.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;mso-fareast-language:EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;mso-fareast-language:EN-US">> Here is what I think this code does:<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;mso-fareast-language:EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;mso-fareast-language:EN-US">> A) LDREX: tmp=*(base+offset)<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;mso-fareast-language:EN-US">> B) tmp -= oldvalue
<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;mso-fareast-language:EN-US">> If *(base+offset) was unchanged, tmp_reg is now 0 and Z is 1<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;mso-fareast-language:EN-US">> C) If Z is 1: STREX the new value: *(base+offset)=newval. Otherwise, omit.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;mso-fareast-language:EN-US">> After this, if the store succeeded, tmp_reg is 0, if the store failed its 1.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;mso-fareast-language:EN-US">> D) Here, tmp_reg is: 0 if the store succeeded, 1 if it failed, 1...n if *(base+offset) had been modified before LDREX.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;mso-fareast-language:EN-US">> We now compare with 1 and ...<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;mso-fareast-language:EN-US">> E) ...repeat the loop if tmp_reg was 1<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;mso-fareast-language:EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;mso-fareast-language:EN-US">> So we loop until either *(base+offset) had been changed to some other value concurrently before out LDREX. Or until our store succeeded.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;mso-fareast-language:EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;mso-fareast-language:EN-US">> I wondered what the loop guards against. And why it would be okay sometimes to omit it.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;mso-fareast-language:EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;mso-fareast-language:EN-US">The loop is needed to try again if the reservation
</span><span lang="DE" style="font-size:11.0pt;mso-fareast-language:EN-US">was</span><span style="font-size:11.0pt;mso-fareast-language:EN-US"> lost</span><span lang="DE" style="font-size:11.0pt;mso-fareast-language:EN-US"> until the STREX succeeds or
</span><span style="font-size:11.0pt;mso-fareast-language:EN-US">*(base+offset) != oldvalue</span><span lang="DE" style="font-size:11.0pt;mso-fareast-language:EN-US">.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;mso-fareast-language:EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="DE" style="font-size:11.0pt;mso-fareast-language:EN-US">So there are two cases.
</span><span style="font-size:11.0pt;mso-fareast-language:EN-US">The loop is left iff<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;mso-fareast-language:EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;mso-fareast-language:EN-US">(1) *(base+offset) != oldvalue<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;mso-fareast-language:EN-US">(2) the STREX succeeded<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;mso-fareast-language:EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="DE" style="font-size:11.0pt;mso-fareast-language:EN-US">First i</span><span style="font-size:11.0pt;mso-fareast-language:EN-US">t is important to understand that C, D, E are only executed if at B the eq-condition is set to true.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;mso-fareast-language:EN-US">This is based on the "Conditional Execution"
</span><span lang="DE" style="font-size:11.0pt;mso-fareast-language:EN-US">f</span><span style="font-size:11.0pt;mso-fareast-language:EN-US">eature of ARM: execution of most instructions can be made dependent on a condition (see
<a href="https://developer.arm.com/documentation/den0013/d/ARM-Thumb-Unified-Assembly-Language-Instructions/Instruction-set-basics/Conditional-execution?lang=en">
https://developer.arm.com/documentation/den0013/d/ARM-Thumb-Unified-Assembly-Language-Instructions/Instruction-set-basics/Conditional-execution?lang=en</a>)<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;mso-fareast-language:EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;mso-fareast-language:EN-US">So in case (1) C, D, E are not executed because B indicates that *(base+offset) and oldvalue are not-eq</span><span lang="DE" style="font-size:11.0pt;mso-fareast-language:EN-US">
and the loop is left.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;mso-fareast-language:EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;mso-fareast-language:EN-US">In case (2) C, D, E are executed. At C, if the reservation from A still exists, tmp_reg will be set to 0 otherwise to 1. At E the branch is taken if D indicated tmp_reg == 1 (reservation
was lost) otherwise the loop is left.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;mso-fareast-language:EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;mso-fareast-language:EN-US">Cheers, Richard.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;mso-fareast-language:EN-US"><o:p> </o:p></span></p>
<div style="border:none;border-top:solid #B5C4DF 1.0pt;padding:3.0pt 0cm 0cm 0cm">
<p class="MsoNormal" style="margin-bottom:12.0pt"><b><span style="font-size:12.0pt;color:black">From:
</span></b><span style="font-size:12.0pt;color:black">porters-dev <porters-dev-retn@openjdk.org> on behalf of Thomas Stüfe <thomas.stuefe@gmail.com><br>
<b>Date: </b>Saturday, 11. March 2023 at 11:19<br>
<b>To: </b>porters-dev@openjdk.org <porters-dev@openjdk.org>, aarch32-port-dev@openjdk.org <aarch32-port-dev@openjdk.org><br>
<b>Subject: </b>Question about CAS via LDREX/STREX on 32-bit arm<o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:11.0pt">Hi ARM experts,<br>
<br>
I am trying to understand how CAS is implemented on arm; in particular, "MacroAssembler::atomic_cas_bool":<br>
<br>
MacroAssembler::atomic_cas_bool<o:p></o:p></span></p>
<div>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:11.0pt">```<o:p></o:p></span></p>
</div>
<p class="MsoNormal" style="margin-bottom:12.0pt"><span style="font-size:11.0pt"> assert_different_registers(tmp_reg, oldval, newval, base);<br>
Label loop;<br>
bind(loop);<br>
A ldrex(tmp_reg, Address(base, offset));<br>
B subs(tmp_reg, tmp_reg, oldval);<br>
C strex(tmp_reg, newval, Address(base, offset), eq);<br>
D cmp(tmp_reg, 1, eq);<br>
E b(loop, eq);<br>
F cmp(tmp_reg, 0);<br>
if (tmpreg == noreg) {<br>
pop(tmp_reg);<br>
}<br>
```<br>
<br>
It uses LDREX and STREX to perform a cas of *(base+offset) from oldval to newval. It does so in a loop. The code distinguishes two failures: STREX failing, and a "semantically failed" CAS.<o:p></o:p></span></p>
<div>
<p class="MsoNormal"><span style="font-size:11.0pt">Here is what I think this code does:<o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
</div>
<p class="MsoNormal" style="margin-bottom:12.0pt"><span style="font-size:11.0pt">A) LDREX: tmp=*(base+offset)<br>
B) tmp -= oldvalue <br>
If *(base+offset) was unchanged, tmp_reg is now 0 and Z is 1<br>
C) If Z is 1: STREX the new value: *(base+offset)=newval. Otherwise, omit.<br>
After this, if the store succeeded, tmp_reg is 0, if the store failed its 1.<br>
D) Here, tmp_reg is: 0 if the store succeeded, 1 if it failed, 1...n if *(base+offset) had been modified before LDREX.<br>
We now compare with 1 and ...<br>
E) ...repeat the loop if tmp_reg was 1<br>
<br>
So we loop until either *(base+offset) had been changed to some other value concurrently before out LDREX. Or until our store succeeded.<br>
<br>
I wondered what the loop guards against. And why it would be okay sometimes to omit it.<br>
<br>
IIUC, STREX fails if the core did lose its exclusive access to the memory location since the LDREX. This can be one of three things, right? :<br>
- another core slipped in an LDREX+STREX to the same location between our LDREX and STREX<br>
- Or we context switched to another thread or process. I assume it does a CLREX then, right? Because how could you prevent a sequence like "LDREX(p1) -> switch -> LDREX(p2) -> switch back STREX(p1)" - if I understand the ARM manual [1] correctly, a STREX to
a different location than the preceding LDREX is undefined.<br>
- Or we had a signal after LDREX and did a second LDREX in the signal handler. Does the kernel do a CLREX when invoking a signal handler?<br>
<br>
More questions:<br>
<br>
- If I got it right, at (D), tmp_reg value "1" has two meanings: either STREX failed or some thread increased the value concurrently by 1. We repeat the loop either way. Is this just accepted behavior? Increasing by 1 is maybe not that rare.<br>
<br>
- If I understood this correctly, the loop guards us mainly against context switches. Without the loop a context switch would count as a "semantically failed" CAS. Why would that be okay? Should we not do this loop always?<br>
<br>
- Do we not need an explicit CLREX after the operation? Or does the STREX also clear the hardware monitor? Or does it just not matter?<br>
<br>
- We have VM_Version::supports_ldrex(). Code seems to me sometimes guarded by this (e.g MacroAssembler::atomic_cas_bool), sometimes code just executes ldrex/strex (e.g. the one-shot path of MacroAssembler::cas_for_lock_acquire). Am I mistaken? Or is LDREX now
generally available? Does ARMv6 mean STREX and LDREX are available?<br>
<br>
<br>
Thanks a lot!<br>
<br>
Cheers, Thomas<br>
<br>
<br>
<o:p></o:p></span></p>
</div>
</div>
</body>
</html>