[aarch64-port-dev ] RFD: scratch registers cleanup [Long post]
Andrew Dinn
adinn at redhat.com
Tue Nov 5 12:24:02 UTC 2019
Hi Andrew,
This is a good start and it could be used as is. However,I think it
needs some polishing to improve the way it works. That could involve
upgrading the model for how scratch registers are declared, allocated,
released and re-allocated. Alternatively, it could just involve
improving the way the current definitions are employed.
I'll start with general comments which address that broader point before
going on to mention specific issues to do with the current patch. These
latter comments will also help to motivate the general point but I think
it is better to make them in context. See after the sig for these comments.
regards,
Andrew Dinn
-----------
Senior Principal Software Engineer
Red Hat UK Ltd
Registered in England and Wales under Company Registration No. 03798903
Directors: Michael Cunningham, Michael ("Mike") O'Neill
----- 8< -------- 8< -------- 8< -------- 8< -------- 8< ---
I think this is a very good start to remedying a nasty problem. However,
I'm not yet convinced the model for how you manage and use scratch
registers is the best way to do this. I'll explain why below and suggest
a variation that might or might not work. If it doesn't then correct me
and that will at least help understand why you ended up with the current
design. If it does work or, at least, suggests how to move towards
something better then let's try another round.
What troubles me most of all is that the mechanism you have provided can
be quite opaque about ownership/liveness of registers. I think that
happens because at root it does not require that declaration +
allocation, release and subsequent re-allocation are co-located in the
same program scope (or,at least, in the same method).
Let's work through some examples to clarify that view. As it stands it
is possible to allocate some scratch registers in a caller method and
then release those registers in a called method (possibly more than one
call down the stack) -- indeed, the latter operation occurs in this code
from the patch:
c1_LIRAssembler_aarch64.cpp:
960 void LIR_Assembler::mem2reg(LIR_Opr src, LIR_Opr dest, BasicType
type, LIR_PatchCode patch_code, CodeEmitInfo* info, bool wide, bool /*
unaligned */) {
961 LIR_Address* addr = src->as_address_ptr();
. . .
976 int null_check_here = code_offset();
977 FreeScratchRegs dummy(__ as());
. . .
ote that we are not in the scope of a visible ScratchRegister
declaration here. So, the use fo a FreeScratchRegs declaration implies
that we may be in the scope of a declaration up the call chain. If that
may be the case then how can the callee safely cancel that allocation?
How can it be sure that the caller is not relying on a scratch register
retaining its value across the call.
Likewise from the POV of the caller which might have declared a
ScratchRegister there is nothing in the code (except perhaps for
comments) to indicate that said scratch register might be overwritten
under the call to this method (or perhaps even a call to an indirect
caller). To those who do not know the details of the callee a
ScratchRegister declared in the caller will appear to remain valid
across the call even though it may be overwritten.
Another example occurs later in the same file
1464 void LIR_Assembler::emit_opTypeCheck(LIR_OpTypeCheck* op) {
1465 const bool should_profile = op->should_profile();
1466
1467 LIR_Code code = op->code();
1468 if (code == lir_store_check) {
. . .
1495 if (should_profile) {
1496 Label not_null;
1497 __ cbnz(value, not_null);
1498 // Object is null; update MDO and exit
1499 ScratchRegister rscratch1(__ as(), r8);
1500 ScratchRegister rscratch2(__ as(), r9); . . .
. . .
1554 } else if (code == lir_checkcast) {
1555 FreeScratchRegs dummy(__ as());
. . .
1564 } else if (code == lir_instanceof) {
1565 FreeScratchRegs dummy(__ as());
. . .
Once again the release operations occur in a scope where there is no
prior declaration in the same scope. So, this use of release
declarations in the else-if clauses implies that a caller may have
allocated a scratch register yet they presume it is safe to release it.
Meanwhile, to make matters worse, the ScratchRegister declarations in
the initial if clause imply that a caller which exercises this path will
/not/ have declared a scratch register. That further implies that the
validity of these combined assumptions about what declarations might be
in scope depends on the callee's conditional logic. Now that's starting
to get a tad too opaque and I fear we are heading down a garden path.
Of course, this may just be down to the fact that the inclusion of those
release operations was superfluous; that they could be removed from
these examples. However, I don't like the fact that one can construct
such examples. I would much prefer it if this usage was avoided where
possible by requiring the release to occur in the scope where the
scratch register was declared. i.e. the caller must arrange to free
scratch registers if a caller might need them. That would make it much
easier to detect circumstances where the caller was trying to have its
cake and eat it.
This model of usage is indeed what happens much of the time e.g. also
from the same file:
1280 void LIR_Assembler::type_profile_helper(Register mdo,
1281 ciMethodData *md,
ciProfileData *data,
1282 Register recv, Label*
update_done) {
1283 ScratchRegister rscratch1(__ as(), r8);
1284 ScratchRegister rscratch2(__ as(), r9);
. . .
1293 {
1294 FreeScratchRegs dummy(__ as());
. . .
1298 }
. . .
With this usage it is very clear that any scratch value established
before line 1294 cannot be relied on in code after line 1298.
I suspect that there are circumstances where a caller can or even must
free scratch registers down the call tree from a allocation so perhaps
we cannot afford always to dispense with this current usage for
FreeScratchRegs altogether. However, I'd prefer it if an explicit
scope-local release was used for wherever possible to avoid this
2nd-guessing of callers' intentions. In which case, it would be much
better if we could use the grammar of language definitions to support
that .. .
. . . which also provides the opportunity to remedy another shortcoming.
The current release operation does not specify which scratch register(s)
is (are) to be released. The only option is free all registers. However,
if a release operation must always be located in the scope of a
ScratchRegister then it is perfectly possible to do so by passing the
relevant ScratchRegister as a parameter. So, perhaps we could use macros
like:
FreeScratchRegister dummy(rscratch1)
FreeScratchRegisters dummy(rscratch1, rscratch2);
That makes it harder for scratch registers to be freed down the stack in
a caller because the caller will not have access to rscratch1 and
rscratch2 unless they are passed in by reerence or pointer. I'm assuming
code won't pass a ScratchRegister as a call argument, merely the
encapsulated Register (we might perhaps be able to knobble that by
ensuring that copying/dereferencing reset the register to noreg).
If this revised model works then the current FreeScratchRegs really
needs renaming to something less appealing like
DeleteCallerScratchBindings that won't tempt anyone to use it.
Also, it would perhaps be useful to provide a reverse operation for
re-allocating a previouly declared and freed scratch register e.g.
public void Foo::bar(...)
ScratchRegister rscratch1(...);
ScratchRegister rscratch2(...);
. . .
<use rscratch1 and rscratch2>
. . .
FreeScratchRegister dummy(rscratch1);
call_user_of_rscratch1(...);
UseScratchRegister dummy2(rscratch1);
...
This explicitly acknowledges that the call may want to use rsctratch1
and explicitly signals in the caller code that rscratch1 cannot be
expected to remain constant across the call. Of course, one can always
achieve the same effect with block nesting but I think this is neater.
There is a bit more to this to do with conditional allocation of a
scratch register that I will discuss in the detailed comments (for
c1_LIRAssembler) but that needs to be done in context so ...
... on to the specific comments:
aarch64.ad:
Firstly, just a humdrum error. You missed a conversion for a specific
use of rscratch1
2851 enc_class aarch64_enc_stlrb(iRegI src, memory mem) %{
2852 MOV_VOLATILE(as_Register($src$$reg), $mem$$base, $mem$$index,
$mem$$scale, $mem$$disp,
2853 rscratch1, stlrb);
2854 %}
This is still passing rscratch1 in the MOV_VOLATILE macro call where
subsequent encoding classes use r8_scratch. It doesn't actually matter
(i.e. it doesn't fail to compile) because MOV_VOLATILE ignores the
SCRATCH argument. I know you are trying to minimize changes but dropping
that argument in all uses would be a whole lot better . . . yes, maybe
in the next patch.
This story continues . . .
2930 enc_class aarch64_enc_fldars(vRegF dst, memory mem) %{
2931 MOV_VOLATILE(r8_scratch, $mem$$base, $mem$$index, $mem$$scale,
$mem$$disp,
2932 r8_scratch, ldarw);
2933 __ fmovs(as_FloatRegister($dst$$reg), r8_scratch);
2934 %}
I'm not sure why you invented the name r8_scratch as an alias for r8.
Does that really help? I couldn't discern why you sometimes used one and
why sometimes the other. If there is a rationale it probaly ought to be
documented somewhere (at least at the point where r8_scratch aand
r9_scratch are declared).
Anyway, what I don't follow is why are these uses just employ a bare
register name. Why are they exempt from the need to declare a scratch
register for r8 before using it?
Why not:
enc_class aarch64_enc_fldars(vRegF dst, memory mem) %{
ScratchRegister rscratch1(__ as(), r8);
MOV_VOLATILE(rscratch1, $mem$$base, $mem$$index, $mem$$scale,
$mem$$disp,
rscratch1, ldarw);
__ fmovs(as_FloatRegister($dst$$reg), rscratch1);
. . .
or, if need be
enc_class aarch64_enc_fldars(vRegF dst, memory mem) %{
{
ScratchRegister rscratch1(__ as(), r8);
MOV_VOLATILE(rscratch1, $mem$$base, $mem$$index, $mem$$scale,
$mem$$disp,
rscratch1, ldarw);
__ fmovs(as_FloatRegister($dst$$reg), rscratch1);
}
. . .
I realize this is extra 'protocol' for cases where register use is
managed by other means and where the invoked assembler methods are not
going to risk reuse of the register. However, the bare use of r8 (or
even under the name r8_scratch) is actually quite confusing. Using the
same protocol for acquiring a scratch register has the important virtue
of consistency. It also allows us to find a way later to avoid having to
explicitly name r8 and r9 in the declaration and, equally, having to
explicitly name them in these encodings e.g. we might just pass an index
in the constructor:
ScratchRegister rscratch1(__ as(), 1);
ScratchRegister rscratch2(__ as(), 2);
This is not the only place where your usage is confusing. You do
(although not consistently) use declarations in some of the later
encodings. For example, further down the file:
3526 enc_class aarch64_enc_fast_lock(iRegP object, iRegP box, iRegP
tmp, iRegP tmp2) %{
. . .
3570 // markWord of object (disp_hdr) with the stack pointer.
3571 __ mov(r8_scratch, sp);
3572 __ sub(disp_hdr, disp_hdr, r8_scratch);
. . .
has no declaration ... but is immediately followed by
3604 enc_class aarch64_enc_fast_unlock(iRegP object, iRegP box, iRegP
tmp, iRegP tmp2) %{
. . .
3644 ScratchRegister rscratch1(__ as(), r8);
. . .
Declarations are also used in some of the instruction definitions:
12103 instruct rolI_rReg(iRegINoSp dst, iRegI src, iRegI shift,
rFlagsReg cr)
. . .
12109 ins_encode %{
12110 ScratchRegister rscratch1(__ as(), r8);
12111 __ subw(rscratch1, zr, as_Register($shift$$reg));
. . .
Is there a clear rationale for whether or not to declare a
ScratchRegister that I am missing? I'd be happier with them being used
everywhere, avoiding explicit mention of Register names at points of use.
Also, one other thing I don't understand but I think is just an error:
3008 enc_class aarch64_enc_stlxr(iRegLNoSp src, memory mem) %{
3009 MacroAssembler _masm(&cbuf);
3010 Register src_reg = as_Register($src$$reg);
3011 Register base = as_Register($mem$$base);
3012 ScratchRegister rscratch2(__ as(), r9);
. . .
3017 if (disp != 0) {
3018 __ lea(r9_scratch, Address(base, disp));
3019 __ stlxr(r8_scratch, src_reg, r9_scratch);
. . .
Why are you declaring r9 as temp rscratch2, not declaring r8 as
rscratch1 and then using both as r9_scratch and r8_scratch? Note also
that in the previous encoding (aarch64_enc_ldaxr) you use r9_scratch and
r8_scratch and don't declare any ScratchRegister.
Oh and
3117 enc_class aarch64_enc_prefetchw(memory mem) %{
. . .
3130 ScratchRegister r8_scratch(__ as(), r8);
3131 __ lea(r8_scratch, Address(base, disp));
3132 __ prfm(Address(r8_scratch, index_reg,
Address::lsl(scale)), PSTL1KEEP);
Why name the scratch register r8_scratch in this decl (shadowing an
existing name) rather than calling it rscratch1?
c1_LIRAssembler_aarch64.cpp:
I found the comment here to be misleading
1581 void LIR_Assembler::casw(Register addr, Register newval, Register
cmpval) {
1582 // r8 is used to pass an argument here, not as scratch. See
1583 // LIRGenerator::atomic_cmpxchg.
1584 __ cmpxchg(addr, cmpval, newval, Assembler::word, /* acquire*/
true, /* release*/ true, /* weak*/ false, r8);
1585 __ cset(r8, Assembler::NE);
1586 __ membar(__ AnyAny);
1587 }
Firstly, to be more strict/precise, r8 is used to return a result from
cmpxchg in order to then pass it on to cset (it read to me like you were
saying that it passes a value into cmpxchg).
More importantly, what the comment does not clarify is that the reason
you cannot allocate r8 as a scratch register here at the point of call
with a ScratchRegister decl is because cmpxchg itself conditionally
reserves r8 as a scratch register in the case where a client passes
noreg. So using a ScratchRegister here would break cmpxchg and not using
a ScratchRegister in cmpxchg would disallow other callers from passing
in noreg or require it to provide two paths to handle the two cases,
noreg or a scratch reg, allocating a ScratchRegister in only one of
those paths. This is all really a bit clumsy and certainly unclear.
This sort of case where a called method conditionally allocates a
scratch register is an important one to be able to handle. I think a way
to deal with this might be to allow methods like cmpxchg to declare a
local scratch register which /uses/ a supplied register if provided and
only /allocates/ a specific scratch register if noreg is supplied. That
would allow the caller to allocate a scratch register and pass it in or
not allocate the register and pass in noreg. Of course it also allows a
caller to pass in a non-scratch register.
So, given the above definition and your current definition for cmpxchg, viz:
2497 void MacroAssembler::cmpxchg(Register addr, Register expected,
2498 Register new_val,
2499 enum operand_size size,
2500 bool acquire, bool release,
2501 bool weak,
2502 Register result) {
2503 ScratchRegister rscratch1(as(), r8);
2504 if (result == noreg) result = rscratch1;
<blithely use result>
we could replace them with
void LIR_Assembler::casw(Register addr, Register newval, Register cmpval) {
// allocate scratch register to return result
// and pass it on to cset
ScratchRegister rscratch1( __ as(), r8);
__ cmpxchg(addr, cmpval, newval, Assembler::word, /* acquire*/ true,
/* release*/ true, /* weak*/ false, rscratch1);
__ cset(rscratch1, Assembler::NE);
__ membar(__ AnyAny);
}
and
void MacroAssembler::cmpxchg(Register addr, Register expected,
Register new_val,
enum operand_size size,
bool acquire, bool release,
bool weak,
Register result) {
ScratchRegister rscratch1(as(),
/*use without alloc if reg*/ result,
/*alloc if noreg*/ r8);
<blithely use rscratch1 instead of result>
This doesn't really work quite right because the caller has to inhibit
allocation in the called method either by passing 1) an explicit
non-scratch register or 2) an allocated scratch register i.e. it is not
quite uniform. However, in cases where noreg is passed the implication
is clear that that rscratch1 (possibly also rscratch2 if two noreg args
may be passed) will be allocated by the callee as an alternative.
n.b. the comment in the original version of LIR_Assembler::casw refers
to LIRGenerator::atomic_cmpxchg. Do you not actually mean
MacroAssembler::cmpxchg?
c1_Runtime_aarch64.cpp
I don't really like this:
51 int StubAssembler::call_RT(Register oop_result1, Register
metadata_result, address entry, int args_size) {
52 // setup registers
. . .
58 FreeScratchRegs dummy(as());
59 ScratchRegister rscratch1(as(), r8);
60 ScratchRegister rscratch2(as(), r9);
Why are the regs freed here rather than in the caller? I realise this
case is special because the callee can know that scratch regs are now
invalid (since we are about to plant a blr all bets are off). However,
this is a disparity with other use cases where the caller needs to make
the decision to free temps. Intentions and expectations would be clearer
if the caller were required to explicitly release scratch vars before a
call to call_RT (and then maybe reallocate them again afterwards).
interp_masm_aarch64.cpp:
You have these top level declarations:
47 static const Register rscratch1 = r8;
48 static const Register rscratch2 = r9;
So, we don't (now or in future) use ScratchRegister here? Why not? Or is
this just an expedient hack? Once again, I'd prefer scratch uses to be
uniform across all the code.
interp_masm_aarch64.hpp:
293 set_last_Java_frame(esp, rfp, (address) pc(), r8);
So, here you are using r8 rather than r8_scratch. Was there a rationale
for that?
interpreterRT_aarch64.cpp:
41 static const Register rscratch1 = r8;
42 static const Register rscratch2 = r9;
Same comment as above for interp_masm_aarch64.cpp.
macroAssembler_aarch64.cpp:
Several more occurrences of callee freeing that make sense given that a
call is being planted but which I think would be clearer if done in the
relevant callers:
679 void MacroAssembler::call_VM_base(Register oop_result,
680 Register java_thread,
681 Register last_java_sp,
682 address entry_point,
683 int number_of_arguments,
684 bool check_exceptions) {
685 FreeScratchRegs regs(as());
. . .
804 address MacroAssembler::emit_trampoline_stub(int
insts_call_instruction_offset,
805 address dest) {
. . .
821 FreeScratchRegs dummy(as());
. . .
1456 void MacroAssembler::call_VM_leaf_base(address entry_point,
1457 int number_of_arguments,
1458 Label *retaddr) {
1459 Label E, L;
1460
1461 FreeScratchRegs dummy(as()); // VM calls clobber all registers but
1462 // we preserve rscratch1.
1463 ScratchRegister rscratch1(as(), r8);
. . .
Also, I don't follow why you sometimes use r8 and other times
r9_scratch in this file.
methodHandles_aarch64.cpp
Once again free in a callee that I'd prefer to see done in the callers:
97 void MethodHandles::jump_from_method_handle(MacroAssembler* _masm,
Register method, Register temp,
98 bool for_compiler_entry) {
99 FreeScratchRegs dummy(__ as());
100 ScratchRegister rscratch1(__ as(), r8);
128 void MethodHandles::jump_to_lambda_form(MacroAssembler* _masm,
129 Register recv, Register
method_temp,
130 Register temp2,
131 bool for_compiler_entry) {
132 FreeScratchRegs(__ as());
stubGenerator_aarch64.cpp
Two cases where it is clear a more selective release would be useful
2104 {
2105 FreeScratchRegs dummy(__ as());
2106 ScratchRegister rscratch2(__ as(), r9);
. . .
2188 {
2189 FreeScratchRegs dummy(__ as());
2190 ScratchRegister rscratch2(__ as(), r9);
templateTable_aarch64.cpp
46 static const Register rscratch1 = r8;
47 static const Register rscratch2 = r9;
Same comment as above for interp_masm_aarch64.cpp.
More information about the aarch64-port-dev
mailing list