[aarch64-port-dev ] RFD: scratch registers cleanup [Long post]

Thu Oct 31 09:21:36 UTC 2019

Hi,

On 10/31/19 8:04 AM, Nick Gasson wrote:
> 
> In general I think this is an improvement. I've put some comments below.
> 
>> However, things haven't quite worked out that way. In practice people
>> "know" that the macro WhizzyMacro they're using only clobbers
>> rscratch1 and so rscratch2 will be "safe" and its contents will be
>> preserved across WhizzyMacro. The danger of this is obvious: a
>> maintenance programmer less familiar with the code base will come
>> along and use some other registers. This is particularly problematic
>> when a macro is used in many places, and if there are several versions
>> of a macro, as happens in the case of GC barriers.
> 
> Do you think this partly stems from having *two* scratch registers, 
> where one or the other is often not clobbered? The 32-bit Arm ports gets 
> away with just a single Rtmp, and perhaps in that case programmers are 
> more likely to think it's always clobbered.

It's possible, but there are many cases where the code would be worse
with only one.

>> There is another risk: aliasing, the extreme sport of referring to the
>> same register in a macro by more than one name. Again, the risk is
>> obvious: the poor programmer won't know that the registers "tmp" and
>> "rscratch2" are in fact the same register.
> 
> This can be mitigated somewhat if the macros name all the registers they 
> use up-front and then do an assert_different_registers(...).

I think you're missing my point: this aliasing has been done
*deliberately*. So the assert_different_registers(...) isn't there.

>> Bear in mind that it is sometimes convenient to avoid declaring
>> scratch registers in macros altogether. You can do this by passing all
>> of the temporary registers a macro needs explicitly as arguments, and
>> this can make the code smaller and simpler.
> 
> Yes, it makes the macro much easier to understand in isolation.
> 
> As someone new to the codebase I would be a bit surprised that seemingly 
> primitive operation like cmpxchg, increment, compare_eq, etc. clobber 
> scratch registers.

That ease of use was the reason for having the scratch registers: that
they were always free, and any macro could use them. IMO this is a
desirable property if it can be used safely, and I believe it can be.

> I think readability would improve if we added explicit temporary
> arguments to these macros. So that the majority of MacroAssembler::*
> then doesn't use the implicit scratch registers at all. Maybe we can
> try this as an exercise and see what's left over? I guess it
> wouldn't increase verbosity all that much.

I disagree. It wouldn't much help readability or ease of maintenance.
Someone would have to be responsible at the top level for choosing
which scratch registers to be passed down. But how do you know which
is and is not a "top level" macro? Sometimes what was once a "top
level" macro becomes a lower-level one.

I'm proposing that we handle it automatically.

>>        {
>>          ScratchRegister rscratch1(__ as(), r8);
>>          __ ldrb(rscratch1, gc_state);
>>          __ tbz(rscratch1, ShenandoahHeap::MARKING_BITPOS, done);
>>        }
>>
>>
>> [NB: it is possible to give the scratch register a name other than
>> rscratch1, but please don't!]
> 
> It's also possible to do `ScratchRegister rscratch1(__ as(), r9)' which 
> is perhaps more confusing.

The penalty for that should be severe, but I would like to try to make
it impossible.

>> Let's say you need to call InnerMacro from WhizzyMacro, and both
>> InnerMacro and WhizzyMacro need scratch registers. You can do this by
>> explicitly freeing the scratch registers, like so:
>>
>>    WhizzyMacro() {
>>      ...
>>      ScratchRegister rscratch1(__ as(), r8);
>>      // Code using rscratch1 ...
>>      {
>>        FreeScratchRegs dummy(__ as());
>>        __ InnerMacro();
>>        // Try to use rscratch1 here and you'll get an assertion
>>        // because you don't own it.
>>      }
>>      // You can use rscratch1 again here.
>>      ...
>>    }
>>
>> In this case it is clear to the reader that InnerMacro() may clobber
>> both scratch registers. At the end of the scope of FreeScratchRegs,
>> rscratch1 is again owned by WhizzyMacro.
> 
> But you could equally write it with two separate ScratchRegister scopes 
> and avoid FreeScratchRegs altogether:
> 
>    WhizzyMacro() {
>      ...
>      {
>        ScratchRegister rscratch1(__ as(), r8);
>        // Code using rscratch1 ...
>      }
>      __ InnerMacro();
>      // Try to use rscratch1 here and you'll get an assertion
>      // because you don't own it.
>      {
>        ScratchRegister rscratch1(__ as(), r8);
>        // You can use rscratch1 again here.
>        ...
>      }
>    }

Sure, and in such a simple case I would, but it's a very simple case
for illustration. Maybe I should have done something more complicated,
but it would obfuscate the example.

> Visually this makes it clear to the reader that the two "rscratch1"s
> are unrelated and not preserved across InnerMacro. Looking at the
> patch I can see why FreeScratchRegs helps minimise changes to the
> existing code, but it still feels like a bit of a kludge.

We have to be realistic, and not let the best be the enemy of the good
enough. This patch introduces a new facility rather and uses it,
rather than what would be a fairly large-scale reorganization of the
code base.

There are many cases where we have deep nesting: for example, one
macro calls another, and there are many nested scopes, and in the
middle someone needs a scratch register. One common case is where a
macro calls a macro calls a macro, and there is a callout to the
runtime which needs a scratch register.

I can (almost) guarantee that this patch won't change the generated
code and therefore won't break anything. That is an important
guarantee at this stage. The chances of rewriting all macros (to pass
scratch registers explicitly) without making mistakes and breaking
things in very hard-to-debug ways is asymptotically close to zero.

>> Webrev at http://cr.openjdk.java.net/~aph/scratch-regs/webrev/
> 
> I couldn't apply that cleanly on top of jdk/jdk. Maybe needs rebasing?

OK, will do. It's not very old, but things move fast.

> src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp
> 
> +class ScratchRegister {
> +  Register _r;
> +  MacroAssembler *_masm;
> 
> I think this class can inherit from StackObj to disallow new/delete, 
> like the other RAII types in Hotspot do.
> 
> +  ScratchRegister(const ScratchRegister &r) {
> +    _r = r._r;
> +    _masm = NULL;
> +  }
> 
> Do we really need to support copying? It seems to complicate the
> model quite a lot and has some issues e.g. if the copy outlives the
> original.  I can only see it being used in the new cmpw overload -
> do we actually call that?

You're right, we could do wihtout that.

> src/hotspot/cpu/aarch64/interp_masm_aarch64.hpp
> 
> -    set_last_Java_frame(esp, rfp, (address) pc(), rscratch1);
> +    set_last_Java_frame(esp, rfp, (address) pc(), r8);
> 
> This is a bit icky. Can we move the body of _call_Unimplemented into the 
> .cpp file where there is already a global constant for rscratch1?

That seems reasonable.

> src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp
> 
> +void tassert(bool shouldBeTrue, const char *msg) {
> +  if (! shouldBeTrue) {
> +    fprintf(stderr, "%s\n", msg);
> +  }
> +}
> 
> This isn't used?

It's all Work In Progress.  :-)

>  > Finally, I'm not wedded to the current syntax for declarations: I can
>  > think of some nicer ways to do it so that you could say
>  >
>  >    use_scratch(rscratch1);
>  >
>  > rather than
>  >
>  >    ScratchRegister rscratch1(__ as(), r8);
> 
> I think it's OK as-is, and preferable to anything that uses 
> pre-processor magic. Maybe something like the following using a move 
> constructor for ScratchRegister would be prettier:
> 
>    ScratchRegister rscratch1(__ scratch_register(r8));

OK, I'll give that some more thought. I'm still thinking I need some
automatic way to name the scratch registers correctly so that some
silly programmer can't do the Wrong Thing.

Thank you for such a thoughtful and detailed reply. It must have taken
some time to write, and it is very helpful.

-- 
Andrew Haley  (he/him)
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
https://keybase.io/andrewhaley
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671