[aarch64-port-dev ] 8133352: aarch64: generates constrained unpredictable instructions

Wed Aug 12 16:23:32 UTC 2015

On Tue, 2015-08-11 at 09:55 -0700, Vladimir Kozlov wrote:
> I think it depends how expensive push/pop on arm64.
> In c2 generated code you may introduced spills to stack around GetAndAdd code since you use additional register (in 
> .ad). So you are saving on stack anyway.
> On other hand your changes (third temp) are not so big and I think acceptable.
> On 8/11/15 8:57 AM, Edward Nevill wrote:
> > Hi,
> >
> > Webrev http://cr.openjdk.java.net/~enevill/8133352/

Hi Vladimir,

Thanks for that. Another possibility is to use the inverse operation to restore the result after it has been corrupted.

EG.

-#define ATOMIC_OP(LDXR, OP, STXR)                                       \
+#define ATOMIC_OP(LDXR, OP, IOP, STXR)                                       \
 void MacroAssembler::atomic_##OP(Register prev, RegisterOrConstant incr, Register addr) { \
   Register result = rscratch2;                                          \
   if (prev->is_valid())                                                 \
@@ -2120,14 +2125,15 @@
   bind(retry_load);                                                     \
   LDXR(result, addr);                                                   \
   OP(rscratch1, result, incr);                                          \
-  STXR(rscratch1, rscratch1, addr);                                     \
-  cbnzw(rscratch1, retry_load);                                         \
-  if (prev->is_valid() && prev != result)                               \
-    mov(prev, result);                                                  \
+  STXR(rscratch2, rscratch1, addr);                                     \
+  cbnzw(rscratch2, retry_load);                                         \
+  if (prev->is_valid() && prev != result) {                             \
+    IOP(prev, rscratch1, incr);                                         \
+  }                                                                     \
 }

-ATOMIC_OP(ldxr, add, stxr)
-ATOMIC_OP(ldxrw, addw, stxrw)
+ATOMIC_OP(ldxr, add, sub, stxr)
+ATOMIC_OP(ldxrw, addw, subw, stxrw)

This essentially creates the extra register we need by using the inverse operation to restore the result.

It doesn't win any beauty contests, but it is probably the most optimal.

All the best,
Ed.