[aarch64-port-dev ] RFD: scratch registers cleanup [Long post]

Thu Oct 31 08:04:23 UTC 2019

Hi Andrew,

In general I think this is an improvement. I've put some comments below.

> 
> However, things haven't quite worked out that way. In practice people
> "know" that the macro WhizzyMacro they're using only clobbers
> rscratch1 and so rscratch2 will be "safe" and its contents will be
> preserved across WhizzyMacro. The danger of this is obvious: a
> maintenance programmer less familiar with the code base will come
> along and use some other registers. This is particularly problematic
> when a macro is used in many places, and if there are several versions
> of a macro, as happens in the case of GC barriers.

Do you think this partly stems from having *two* scratch registers, 
where one or the other is often not clobbered? The 32-bit Arm ports gets 
away with just a single Rtmp, and perhaps in that case programmers are 
more likely to think it's always clobbered.

> 
> There is another risk: aliasing, the extreme sport of referring to the
> same register in a macro by more than one name. Again, the risk is
> obvious: the poor programmer won't know that the registers "tmp" and
> "rscratch2" are in fact the same register.

This can be mitigated somewhat if the macros name all the registers they 
use up-front and then do an assert_different_registers(...).

> 
> Bear in mind that it is sometimes convenient to avoid declaring
> scratch registers in macros altogether. You can do this by passing all
> of the temporary registers a macro needs explicitly as arguments, and
> this can make the code smaller and simpler.

Yes, it makes the macro much easier to understand in isolation.

As someone new to the codebase I would be a bit surprised that seemingly 
primitive operation like cmpxchg, increment, compare_eq, etc. clobber 
scratch registers. I think readability would improve if we added 
explicit temporary arguments to these macros. So that the majority of 
MacroAssembler::* then doesn't use the implicit scratch registers at 
all. Maybe we can try this as an exercise and see what's left over? I 
guess it wouldn't increase verbosity all that much.

> 
>        {
>          ScratchRegister rscratch1(__ as(), r8);
>          __ ldrb(rscratch1, gc_state);
>          __ tbz(rscratch1, ShenandoahHeap::MARKING_BITPOS, done);
>        }
> 
> 
> [NB: it is possible to give the scratch register a name other than
> rscratch1, but please don't!]

It's also possible to do `ScratchRegister rscratch1(__ as(), r9)' which 
is perhaps more confusing.

> 
> Let's say you need to call InnerMacro from WhizzyMacro, and both
> InnerMacro and WhizzyMacro need scratch registers. You can do this by
> explicitly freeing the scratch registers, like so:
> 
>    WhizzyMacro() {
>      ...
>      ScratchRegister rscratch1(__ as(), r8);
>      // Code using rscratch1 ...
>      {
>        FreeScratchRegs dummy(__ as());
>        __ InnerMacro();
>        // Try to use rscratch1 here and you'll get an assertion
>        // because you don't own it.
>      }
>      // You can use rscratch1 again here.
>      ...
>    }
> 
> In this case it is clear to the reader that InnerMacro() may clobber
> both scratch registers. At the end of the scope of FreeScratchRegs,
> rscratch1 is again owned by WhizzyMacro.

But you could equally write it with two separate ScratchRegister scopes 
and avoid FreeScratchRegs altogether:

   WhizzyMacro() {
     ...
     {
       ScratchRegister rscratch1(__ as(), r8);
       // Code using rscratch1 ...
     }
     __ InnerMacro();
     // Try to use rscratch1 here and you'll get an assertion
     // because you don't own it.
     {
       ScratchRegister rscratch1(__ as(), r8);
       // You can use rscratch1 again here.
       ...
     }
   }

Visually this makes it clear to the reader that the two "rscratch1"s are 
unrelated and not preserved across InnerMacro. Looking at the patch I 
can see why FreeScratchRegs helps minimise changes to the existing code, 
but it still feels like a bit of a kludge.

> 
> Webrev at http://cr.openjdk.java.net/~aph/scratch-regs/webrev/
> 

I couldn't apply that cleanly on top of jdk/jdk. Maybe needs rebasing?

src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp

+class ScratchRegister {
+  Register _r;
+  MacroAssembler *_masm;

I think this class can inherit from StackObj to disallow new/delete, 
like the other RAII types in Hotspot do.

+  ScratchRegister(const ScratchRegister &r) {
+    _r = r._r;
+    _masm = NULL;
+  }

Do we really need to support copying? It seems to complicate the model 
quite a lot and has some issues e.g. if the copy outlives the original. 
I can only see it being used in the new cmpw overload - do we actually 
call that?

src/hotspot/cpu/aarch64/interp_masm_aarch64.hpp

-    set_last_Java_frame(esp, rfp, (address) pc(), rscratch1);
+    set_last_Java_frame(esp, rfp, (address) pc(), r8);

This is a bit icky. Can we move the body of _call_Unimplemented into the 
.cpp file where there is already a global constant for rscratch1?

src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp

+void tassert(bool shouldBeTrue, const char *msg) {
+  if (! shouldBeTrue) {
+    fprintf(stderr, "%s\n", msg);
+  }
+}

This isn't used?

 > Finally, I'm not wedded to the current syntax for declarations: I can
 > think of some nicer ways to do it so that you could say
 >
 >    use_scratch(rscratch1);
 >
 > rather than
 >
 >    ScratchRegister rscratch1(__ as(), r8);

I think it's OK as-is, and preferable to anything that uses 
pre-processor magic. Maybe something like the following using a move 
constructor for ScratchRegister would be prettier:

   ScratchRegister rscratch1(__ scratch_register(r8));

Thanks,
Nick