RFD: scratch registers cleanup [Long post]
[TL, DR: A large patch to clean up the use of scratch registers in the AArch64 port. Anyone who has ever written assembly code for the AArch64 port should probably read this.] In AArch64 HotSpot we reserve two scratch registers for general use and especially for use in macros. When we first wrote the port things were fairly simple: low-level assembler macros would use these registers freely and any macro that calls a macro would have to be aware that rscratch1 and rscratch2 would be clobbered. However, things haven't quite worked out that way. In practice people "know" that the macro WhizzyMacro they're using only clobbers rscratch1 and so rscratch2 will be "safe" and its contents will be preserved across WhizzyMacro. The danger of this is obvious: a maintenance programmer less familiar with the code base will come along and use some other registers. This is particularly problematic when a macro is used in many places, and if there are several versions of a macro, as happens in the case of GC barriers. There is another risk: aliasing, the extreme sport of referring to the same register in a macro by more than one name. Again, the risk is obvious: the poor programmer won't know that the registers "tmp" and "rscratch2" are in fact the same register. And there is the most excitingly risky practice of all, where an input operand to WhizzyMacro is passed in one of the scratch registers *by a different name* and WhizzyMacro also freely uses the same scratch register by its generic name. This has caused some programmers to confuse themselves about the code they were writing *while they were writing it*. Please do not do this. It is the road to perdition. Bear in mind that it is sometimes convenient to avoid declaring scratch registers in macros altogether. You can do this by passing all of the temporary registers a macro needs explicitly as arguments, and this can make the code smaller and simpler. Despite all of the foregoing, I still believe that reserving a couple of scratch registers was a good idea, but we need to clarify rules of ownership so that it is always clear to the reader who "owns" which scratch registers. Sometimes this is informally documented in comments, often not. I propose to mitigate this problem by removing the global declarations of rscratch1 and rscratch2 altogether and instead requiring them to be declared at the point of use. The Assembler will keep track of which registers are in use and which are free. For example, to use rscratch1 declare it thusly: ScratchRegister rscratch1(__ as(), r8); After this declaration rscratch1 may be used until the end of the innermost enclosing scope, like so: { ScratchRegister rscratch1(__ as(), r8); __ ldrb(rscratch1, gc_state); __ tbz(rscratch1, ShenandoahHeap::MARKING_BITPOS, done); } [NB: it is possible to give the scratch register a name other than rscratch1, but please don't!] Let's say you need to call InnerMacro from WhizzyMacro, and both InnerMacro and WhizzyMacro need scratch registers. You can do this by explicitly freeing the scratch registers, like so: WhizzyMacro() { ... ScratchRegister rscratch1(__ as(), r8); // Code using rscratch1 ... { FreeScratchRegs dummy(__ as()); __ InnerMacro(); // Try to use rscratch1 here and you'll get an assertion // because you don't own it. } // You can use rscratch1 again here. ... } In this case it is clear to the reader that InnerMacro() may clobber both scratch registers. At the end of the scope of FreeScratchRegs, rscratch1 is again owned by WhizzyMacro. I've written a prototype of this idea which does not (intentionally) change any of the generated code, it just documents what it does, and it enforces it. Anyone who uses a register way which conflicts with what has been declared will get an assertion failure. In a few special cases in this prototype I haven't attempted to use the new syntax. The interpreter has its own convention for register usage so scattering ScratchRegister declarations wouldn't much help. Likewise in aarch64.ad: many instruction patterns wouldn't benefit from such declarations because scratch registers are always free to use in patterns, but be careful if the patterns call macros. In that case you should probably declare the scratch registers you use. Some of the resulting patch looks gnarly and complex, but that's just the way that the existing code is structured, I'm just documenting it. The code itself probably could be rewritten to make things simpler and cleaner, but not as part of this patch. Webrev at http://cr.openjdk.java.net/~aph/scratch-regs/webrev/ I'm thinking of disabling register ownership checking in release builds and release branches because I don't want assertions to trigger in production systems and cause the VM to abort. We should leave it on in debug builds and the development trunk. Finally, I'm not wedded to the current syntax for declarations: I can think of some nicer ways to do it so that you could say use_scratch(rscratch1); rather than ScratchRegister rscratch1(__ as(), r8); Comments gratefully received. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. <https://www.redhat.com> https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671
Hi Andrew, In general I think this is an improvement. I've put some comments below.
However, things haven't quite worked out that way. In practice people "know" that the macro WhizzyMacro they're using only clobbers rscratch1 and so rscratch2 will be "safe" and its contents will be preserved across WhizzyMacro. The danger of this is obvious: a maintenance programmer less familiar with the code base will come along and use some other registers. This is particularly problematic when a macro is used in many places, and if there are several versions of a macro, as happens in the case of GC barriers.
Do you think this partly stems from having *two* scratch registers, where one or the other is often not clobbered? The 32-bit Arm ports gets away with just a single Rtmp, and perhaps in that case programmers are more likely to think it's always clobbered.
There is another risk: aliasing, the extreme sport of referring to the same register in a macro by more than one name. Again, the risk is obvious: the poor programmer won't know that the registers "tmp" and "rscratch2" are in fact the same register.
This can be mitigated somewhat if the macros name all the registers they use up-front and then do an assert_different_registers(...).
Bear in mind that it is sometimes convenient to avoid declaring scratch registers in macros altogether. You can do this by passing all of the temporary registers a macro needs explicitly as arguments, and this can make the code smaller and simpler.
Yes, it makes the macro much easier to understand in isolation. As someone new to the codebase I would be a bit surprised that seemingly primitive operation like cmpxchg, increment, compare_eq, etc. clobber scratch registers. I think readability would improve if we added explicit temporary arguments to these macros. So that the majority of MacroAssembler::* then doesn't use the implicit scratch registers at all. Maybe we can try this as an exercise and see what's left over? I guess it wouldn't increase verbosity all that much.
{ ScratchRegister rscratch1(__ as(), r8); __ ldrb(rscratch1, gc_state); __ tbz(rscratch1, ShenandoahHeap::MARKING_BITPOS, done); }
[NB: it is possible to give the scratch register a name other than rscratch1, but please don't!]
It's also possible to do `ScratchRegister rscratch1(__ as(), r9)' which is perhaps more confusing.
Let's say you need to call InnerMacro from WhizzyMacro, and both InnerMacro and WhizzyMacro need scratch registers. You can do this by explicitly freeing the scratch registers, like so:
WhizzyMacro() { ... ScratchRegister rscratch1(__ as(), r8); // Code using rscratch1 ... { FreeScratchRegs dummy(__ as()); __ InnerMacro(); // Try to use rscratch1 here and you'll get an assertion // because you don't own it. } // You can use rscratch1 again here. ... }
In this case it is clear to the reader that InnerMacro() may clobber both scratch registers. At the end of the scope of FreeScratchRegs, rscratch1 is again owned by WhizzyMacro.
But you could equally write it with two separate ScratchRegister scopes and avoid FreeScratchRegs altogether: WhizzyMacro() { ... { ScratchRegister rscratch1(__ as(), r8); // Code using rscratch1 ... } __ InnerMacro(); // Try to use rscratch1 here and you'll get an assertion // because you don't own it. { ScratchRegister rscratch1(__ as(), r8); // You can use rscratch1 again here. ... } } Visually this makes it clear to the reader that the two "rscratch1"s are unrelated and not preserved across InnerMacro. Looking at the patch I can see why FreeScratchRegs helps minimise changes to the existing code, but it still feels like a bit of a kludge.
Webrev at http://cr.openjdk.java.net/~aph/scratch-regs/webrev/
I couldn't apply that cleanly on top of jdk/jdk. Maybe needs rebasing? src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp +class ScratchRegister { + Register _r; + MacroAssembler *_masm; I think this class can inherit from StackObj to disallow new/delete, like the other RAII types in Hotspot do. + ScratchRegister(const ScratchRegister &r) { + _r = r._r; + _masm = NULL; + } Do we really need to support copying? It seems to complicate the model quite a lot and has some issues e.g. if the copy outlives the original. I can only see it being used in the new cmpw overload - do we actually call that? src/hotspot/cpu/aarch64/interp_masm_aarch64.hpp - set_last_Java_frame(esp, rfp, (address) pc(), rscratch1); + set_last_Java_frame(esp, rfp, (address) pc(), r8); This is a bit icky. Can we move the body of _call_Unimplemented into the .cpp file where there is already a global constant for rscratch1? src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp +void tassert(bool shouldBeTrue, const char *msg) { + if (! shouldBeTrue) { + fprintf(stderr, "%s\n", msg); + } +} This isn't used?
Finally, I'm not wedded to the current syntax for declarations: I can think of some nicer ways to do it so that you could say
use_scratch(rscratch1);
rather than
ScratchRegister rscratch1(__ as(), r8);
I think it's OK as-is, and preferable to anything that uses pre-processor magic. Maybe something like the following using a move constructor for ScratchRegister would be prettier: ScratchRegister rscratch1(__ scratch_register(r8)); Thanks, Nick
Hi, On 10/31/19 8:04 AM, Nick Gasson wrote:
In general I think this is an improvement. I've put some comments below.
However, things haven't quite worked out that way. In practice people "know" that the macro WhizzyMacro they're using only clobbers rscratch1 and so rscratch2 will be "safe" and its contents will be preserved across WhizzyMacro. The danger of this is obvious: a maintenance programmer less familiar with the code base will come along and use some other registers. This is particularly problematic when a macro is used in many places, and if there are several versions of a macro, as happens in the case of GC barriers.
Do you think this partly stems from having *two* scratch registers, where one or the other is often not clobbered? The 32-bit Arm ports gets away with just a single Rtmp, and perhaps in that case programmers are more likely to think it's always clobbered.
It's possible, but there are many cases where the code would be worse with only one.
There is another risk: aliasing, the extreme sport of referring to the same register in a macro by more than one name. Again, the risk is obvious: the poor programmer won't know that the registers "tmp" and "rscratch2" are in fact the same register.
This can be mitigated somewhat if the macros name all the registers they use up-front and then do an assert_different_registers(...).
I think you're missing my point: this aliasing has been done *deliberately*. So the assert_different_registers(...) isn't there.
Bear in mind that it is sometimes convenient to avoid declaring scratch registers in macros altogether. You can do this by passing all of the temporary registers a macro needs explicitly as arguments, and this can make the code smaller and simpler.
Yes, it makes the macro much easier to understand in isolation.
As someone new to the codebase I would be a bit surprised that seemingly primitive operation like cmpxchg, increment, compare_eq, etc. clobber scratch registers.
That ease of use was the reason for having the scratch registers: that they were always free, and any macro could use them. IMO this is a desirable property if it can be used safely, and I believe it can be.
I think readability would improve if we added explicit temporary arguments to these macros. So that the majority of MacroAssembler::* then doesn't use the implicit scratch registers at all. Maybe we can try this as an exercise and see what's left over? I guess it wouldn't increase verbosity all that much.
I disagree. It wouldn't much help readability or ease of maintenance. Someone would have to be responsible at the top level for choosing which scratch registers to be passed down. But how do you know which is and is not a "top level" macro? Sometimes what was once a "top level" macro becomes a lower-level one. I'm proposing that we handle it automatically.
{ ScratchRegister rscratch1(__ as(), r8); __ ldrb(rscratch1, gc_state); __ tbz(rscratch1, ShenandoahHeap::MARKING_BITPOS, done); }
[NB: it is possible to give the scratch register a name other than rscratch1, but please don't!]
It's also possible to do `ScratchRegister rscratch1(__ as(), r9)' which is perhaps more confusing.
The penalty for that should be severe, but I would like to try to make it impossible.
Let's say you need to call InnerMacro from WhizzyMacro, and both InnerMacro and WhizzyMacro need scratch registers. You can do this by explicitly freeing the scratch registers, like so:
WhizzyMacro() { ... ScratchRegister rscratch1(__ as(), r8); // Code using rscratch1 ... { FreeScratchRegs dummy(__ as()); __ InnerMacro(); // Try to use rscratch1 here and you'll get an assertion // because you don't own it. } // You can use rscratch1 again here. ... }
In this case it is clear to the reader that InnerMacro() may clobber both scratch registers. At the end of the scope of FreeScratchRegs, rscratch1 is again owned by WhizzyMacro.
But you could equally write it with two separate ScratchRegister scopes and avoid FreeScratchRegs altogether:
WhizzyMacro() { ... { ScratchRegister rscratch1(__ as(), r8); // Code using rscratch1 ... } __ InnerMacro(); // Try to use rscratch1 here and you'll get an assertion // because you don't own it. { ScratchRegister rscratch1(__ as(), r8); // You can use rscratch1 again here. ... } }
Sure, and in such a simple case I would, but it's a very simple case for illustration. Maybe I should have done something more complicated, but it would obfuscate the example.
Visually this makes it clear to the reader that the two "rscratch1"s are unrelated and not preserved across InnerMacro. Looking at the patch I can see why FreeScratchRegs helps minimise changes to the existing code, but it still feels like a bit of a kludge.
We have to be realistic, and not let the best be the enemy of the good enough. This patch introduces a new facility rather and uses it, rather than what would be a fairly large-scale reorganization of the code base. There are many cases where we have deep nesting: for example, one macro calls another, and there are many nested scopes, and in the middle someone needs a scratch register. One common case is where a macro calls a macro calls a macro, and there is a callout to the runtime which needs a scratch register. I can (almost) guarantee that this patch won't change the generated code and therefore won't break anything. That is an important guarantee at this stage. The chances of rewriting all macros (to pass scratch registers explicitly) without making mistakes and breaking things in very hard-to-debug ways is asymptotically close to zero.
Webrev at http://cr.openjdk.java.net/~aph/scratch-regs/webrev/
I couldn't apply that cleanly on top of jdk/jdk. Maybe needs rebasing?
OK, will do. It's not very old, but things move fast.
src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp
+class ScratchRegister { + Register _r; + MacroAssembler *_masm;
I think this class can inherit from StackObj to disallow new/delete, like the other RAII types in Hotspot do.
+ ScratchRegister(const ScratchRegister &r) { + _r = r._r; + _masm = NULL; + }
Do we really need to support copying? It seems to complicate the model quite a lot and has some issues e.g. if the copy outlives the original. I can only see it being used in the new cmpw overload - do we actually call that?
You're right, we could do wihtout that.
src/hotspot/cpu/aarch64/interp_masm_aarch64.hpp
- set_last_Java_frame(esp, rfp, (address) pc(), rscratch1); + set_last_Java_frame(esp, rfp, (address) pc(), r8);
This is a bit icky. Can we move the body of _call_Unimplemented into the .cpp file where there is already a global constant for rscratch1?
That seems reasonable.
src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp
+void tassert(bool shouldBeTrue, const char *msg) { + if (! shouldBeTrue) { + fprintf(stderr, "%s\n", msg); + } +}
This isn't used?
It's all Work In Progress. :-)
Finally, I'm not wedded to the current syntax for declarations: I can think of some nicer ways to do it so that you could say
use_scratch(rscratch1);
rather than
ScratchRegister rscratch1(__ as(), r8);
I think it's OK as-is, and preferable to anything that uses pre-processor magic. Maybe something like the following using a move constructor for ScratchRegister would be prettier:
ScratchRegister rscratch1(__ scratch_register(r8));
OK, I'll give that some more thought. I'm still thinking I need some automatic way to name the scratch registers correctly so that some silly programmer can't do the Wrong Thing. Thank you for such a thoughtful and detailed reply. It must have taken some time to write, and it is very helpful. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. <https://www.redhat.com> https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671
On 10/31/19 8:04 AM, Nick Gasson wrote:
I couldn't apply that cleanly on top of jdk/jdk. Maybe needs rebasing?
Phew. New patch at http://cr.openjdk.java.net/~aph/scratch-regs/webrev.1/ -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. <https://www.redhat.com> https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671
On 30/10/2019 18:58, Andrew Haley wrote:
[TL, DR: A large patch to clean up the use of scratch registers in the AArch64 port. Anyone who has ever written assembly code for the AArch64 port should probably read this.] . . . Comments gratefully received. I'm looking at this. It's a very nice idea and will take some time to give it the review it deserves.
regards, Andrew Dinn ----------- Senior Principal Software Engineer Red Hat UK Ltd Registered in England and Wales under Company Registration No. 03798903 Directors: Michael Cunningham, Michael ("Mike") O'Neill
On 10/31/19 10:58 AM, Andrew Dinn wrote:
On 30/10/2019 18:58, Andrew Haley wrote:
[TL, DR: A large patch to clean up the use of scratch registers in the AArch64 port. Anyone who has ever written assembly code for the AArch64 port should probably read this.] . . . Comments gratefully received.
I'm looking at this. It's a very nice idea and will take some time to give it the review it deserves.
OK. bear in mind that it's still a bit scrappy, but I'm releasing it early to give you that time. BTW, I wrote an earlier version of this patch which allocated the registers automagically rather than the requiring the programmer to name them, but it turned into such a mess that I gave up. IMO it's better to do this more gradually, with this first patch making essentially no changes. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. <https://www.redhat.com> https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671
Hi Andrew, This is a good start and it could be used as is. However,I think it needs some polishing to improve the way it works. That could involve upgrading the model for how scratch registers are declared, allocated, released and re-allocated. Alternatively, it could just involve improving the way the current definitions are employed. I'll start with general comments which address that broader point before going on to mention specific issues to do with the current patch. These latter comments will also help to motivate the general point but I think it is better to make them in context. See after the sig for these comments. regards, Andrew Dinn ----------- Senior Principal Software Engineer Red Hat UK Ltd Registered in England and Wales under Company Registration No. 03798903 Directors: Michael Cunningham, Michael ("Mike") O'Neill ----- 8< -------- 8< -------- 8< -------- 8< -------- 8< --- I think this is a very good start to remedying a nasty problem. However, I'm not yet convinced the model for how you manage and use scratch registers is the best way to do this. I'll explain why below and suggest a variation that might or might not work. If it doesn't then correct me and that will at least help understand why you ended up with the current design. If it does work or, at least, suggests how to move towards something better then let's try another round. What troubles me most of all is that the mechanism you have provided can be quite opaque about ownership/liveness of registers. I think that happens because at root it does not require that declaration + allocation, release and subsequent re-allocation are co-located in the same program scope (or,at least, in the same method). Let's work through some examples to clarify that view. As it stands it is possible to allocate some scratch registers in a caller method and then release those registers in a called method (possibly more than one call down the stack) -- indeed, the latter operation occurs in this code from the patch: c1_LIRAssembler_aarch64.cpp: 960 void LIR_Assembler::mem2reg(LIR_Opr src, LIR_Opr dest, BasicType type, LIR_PatchCode patch_code, CodeEmitInfo* info, bool wide, bool /* unaligned */) { 961 LIR_Address* addr = src->as_address_ptr(); . . . 976 int null_check_here = code_offset(); 977 FreeScratchRegs dummy(__ as()); . . . ote that we are not in the scope of a visible ScratchRegister declaration here. So, the use fo a FreeScratchRegs declaration implies that we may be in the scope of a declaration up the call chain. If that may be the case then how can the callee safely cancel that allocation? How can it be sure that the caller is not relying on a scratch register retaining its value across the call. Likewise from the POV of the caller which might have declared a ScratchRegister there is nothing in the code (except perhaps for comments) to indicate that said scratch register might be overwritten under the call to this method (or perhaps even a call to an indirect caller). To those who do not know the details of the callee a ScratchRegister declared in the caller will appear to remain valid across the call even though it may be overwritten. Another example occurs later in the same file 1464 void LIR_Assembler::emit_opTypeCheck(LIR_OpTypeCheck* op) { 1465 const bool should_profile = op->should_profile(); 1466 1467 LIR_Code code = op->code(); 1468 if (code == lir_store_check) { . . . 1495 if (should_profile) { 1496 Label not_null; 1497 __ cbnz(value, not_null); 1498 // Object is null; update MDO and exit 1499 ScratchRegister rscratch1(__ as(), r8); 1500 ScratchRegister rscratch2(__ as(), r9); . . . . . . 1554 } else if (code == lir_checkcast) { 1555 FreeScratchRegs dummy(__ as()); . . . 1564 } else if (code == lir_instanceof) { 1565 FreeScratchRegs dummy(__ as()); . . . Once again the release operations occur in a scope where there is no prior declaration in the same scope. So, this use of release declarations in the else-if clauses implies that a caller may have allocated a scratch register yet they presume it is safe to release it. Meanwhile, to make matters worse, the ScratchRegister declarations in the initial if clause imply that a caller which exercises this path will /not/ have declared a scratch register. That further implies that the validity of these combined assumptions about what declarations might be in scope depends on the callee's conditional logic. Now that's starting to get a tad too opaque and I fear we are heading down a garden path. Of course, this may just be down to the fact that the inclusion of those release operations was superfluous; that they could be removed from these examples. However, I don't like the fact that one can construct such examples. I would much prefer it if this usage was avoided where possible by requiring the release to occur in the scope where the scratch register was declared. i.e. the caller must arrange to free scratch registers if a caller might need them. That would make it much easier to detect circumstances where the caller was trying to have its cake and eat it. This model of usage is indeed what happens much of the time e.g. also from the same file: 1280 void LIR_Assembler::type_profile_helper(Register mdo, 1281 ciMethodData *md, ciProfileData *data, 1282 Register recv, Label* update_done) { 1283 ScratchRegister rscratch1(__ as(), r8); 1284 ScratchRegister rscratch2(__ as(), r9); . . . 1293 { 1294 FreeScratchRegs dummy(__ as()); . . . 1298 } . . . With this usage it is very clear that any scratch value established before line 1294 cannot be relied on in code after line 1298. I suspect that there are circumstances where a caller can or even must free scratch registers down the call tree from a allocation so perhaps we cannot afford always to dispense with this current usage for FreeScratchRegs altogether. However, I'd prefer it if an explicit scope-local release was used for wherever possible to avoid this 2nd-guessing of callers' intentions. In which case, it would be much better if we could use the grammar of language definitions to support that .. . . . . which also provides the opportunity to remedy another shortcoming. The current release operation does not specify which scratch register(s) is (are) to be released. The only option is free all registers. However, if a release operation must always be located in the scope of a ScratchRegister then it is perfectly possible to do so by passing the relevant ScratchRegister as a parameter. So, perhaps we could use macros like: FreeScratchRegister dummy(rscratch1) FreeScratchRegisters dummy(rscratch1, rscratch2); That makes it harder for scratch registers to be freed down the stack in a caller because the caller will not have access to rscratch1 and rscratch2 unless they are passed in by reerence or pointer. I'm assuming code won't pass a ScratchRegister as a call argument, merely the encapsulated Register (we might perhaps be able to knobble that by ensuring that copying/dereferencing reset the register to noreg). If this revised model works then the current FreeScratchRegs really needs renaming to something less appealing like DeleteCallerScratchBindings that won't tempt anyone to use it. Also, it would perhaps be useful to provide a reverse operation for re-allocating a previouly declared and freed scratch register e.g. public void Foo::bar(...) ScratchRegister rscratch1(...); ScratchRegister rscratch2(...); . . . <use rscratch1 and rscratch2> . . . FreeScratchRegister dummy(rscratch1); call_user_of_rscratch1(...); UseScratchRegister dummy2(rscratch1); ... This explicitly acknowledges that the call may want to use rsctratch1 and explicitly signals in the caller code that rscratch1 cannot be expected to remain constant across the call. Of course, one can always achieve the same effect with block nesting but I think this is neater. There is a bit more to this to do with conditional allocation of a scratch register that I will discuss in the detailed comments (for c1_LIRAssembler) but that needs to be done in context so ... ... on to the specific comments: aarch64.ad: Firstly, just a humdrum error. You missed a conversion for a specific use of rscratch1 2851 enc_class aarch64_enc_stlrb(iRegI src, memory mem) %{ 2852 MOV_VOLATILE(as_Register($src$$reg), $mem$$base, $mem$$index, $mem$$scale, $mem$$disp, 2853 rscratch1, stlrb); 2854 %} This is still passing rscratch1 in the MOV_VOLATILE macro call where subsequent encoding classes use r8_scratch. It doesn't actually matter (i.e. it doesn't fail to compile) because MOV_VOLATILE ignores the SCRATCH argument. I know you are trying to minimize changes but dropping that argument in all uses would be a whole lot better . . . yes, maybe in the next patch. This story continues . . . 2930 enc_class aarch64_enc_fldars(vRegF dst, memory mem) %{ 2931 MOV_VOLATILE(r8_scratch, $mem$$base, $mem$$index, $mem$$scale, $mem$$disp, 2932 r8_scratch, ldarw); 2933 __ fmovs(as_FloatRegister($dst$$reg), r8_scratch); 2934 %} I'm not sure why you invented the name r8_scratch as an alias for r8. Does that really help? I couldn't discern why you sometimes used one and why sometimes the other. If there is a rationale it probaly ought to be documented somewhere (at least at the point where r8_scratch aand r9_scratch are declared). Anyway, what I don't follow is why are these uses just employ a bare register name. Why are they exempt from the need to declare a scratch register for r8 before using it? Why not: enc_class aarch64_enc_fldars(vRegF dst, memory mem) %{ ScratchRegister rscratch1(__ as(), r8); MOV_VOLATILE(rscratch1, $mem$$base, $mem$$index, $mem$$scale, $mem$$disp, rscratch1, ldarw); __ fmovs(as_FloatRegister($dst$$reg), rscratch1); . . . or, if need be enc_class aarch64_enc_fldars(vRegF dst, memory mem) %{ { ScratchRegister rscratch1(__ as(), r8); MOV_VOLATILE(rscratch1, $mem$$base, $mem$$index, $mem$$scale, $mem$$disp, rscratch1, ldarw); __ fmovs(as_FloatRegister($dst$$reg), rscratch1); } . . . I realize this is extra 'protocol' for cases where register use is managed by other means and where the invoked assembler methods are not going to risk reuse of the register. However, the bare use of r8 (or even under the name r8_scratch) is actually quite confusing. Using the same protocol for acquiring a scratch register has the important virtue of consistency. It also allows us to find a way later to avoid having to explicitly name r8 and r9 in the declaration and, equally, having to explicitly name them in these encodings e.g. we might just pass an index in the constructor: ScratchRegister rscratch1(__ as(), 1); ScratchRegister rscratch2(__ as(), 2); This is not the only place where your usage is confusing. You do (although not consistently) use declarations in some of the later encodings. For example, further down the file: 3526 enc_class aarch64_enc_fast_lock(iRegP object, iRegP box, iRegP tmp, iRegP tmp2) %{ . . . 3570 // markWord of object (disp_hdr) with the stack pointer. 3571 __ mov(r8_scratch, sp); 3572 __ sub(disp_hdr, disp_hdr, r8_scratch); . . . has no declaration ... but is immediately followed by 3604 enc_class aarch64_enc_fast_unlock(iRegP object, iRegP box, iRegP tmp, iRegP tmp2) %{ . . . 3644 ScratchRegister rscratch1(__ as(), r8); . . . Declarations are also used in some of the instruction definitions: 12103 instruct rolI_rReg(iRegINoSp dst, iRegI src, iRegI shift, rFlagsReg cr) . . . 12109 ins_encode %{ 12110 ScratchRegister rscratch1(__ as(), r8); 12111 __ subw(rscratch1, zr, as_Register($shift$$reg)); . . . Is there a clear rationale for whether or not to declare a ScratchRegister that I am missing? I'd be happier with them being used everywhere, avoiding explicit mention of Register names at points of use. Also, one other thing I don't understand but I think is just an error: 3008 enc_class aarch64_enc_stlxr(iRegLNoSp src, memory mem) %{ 3009 MacroAssembler _masm(&cbuf); 3010 Register src_reg = as_Register($src$$reg); 3011 Register base = as_Register($mem$$base); 3012 ScratchRegister rscratch2(__ as(), r9); . . . 3017 if (disp != 0) { 3018 __ lea(r9_scratch, Address(base, disp)); 3019 __ stlxr(r8_scratch, src_reg, r9_scratch); . . . Why are you declaring r9 as temp rscratch2, not declaring r8 as rscratch1 and then using both as r9_scratch and r8_scratch? Note also that in the previous encoding (aarch64_enc_ldaxr) you use r9_scratch and r8_scratch and don't declare any ScratchRegister. Oh and 3117 enc_class aarch64_enc_prefetchw(memory mem) %{ . . . 3130 ScratchRegister r8_scratch(__ as(), r8); 3131 __ lea(r8_scratch, Address(base, disp)); 3132 __ prfm(Address(r8_scratch, index_reg, Address::lsl(scale)), PSTL1KEEP); Why name the scratch register r8_scratch in this decl (shadowing an existing name) rather than calling it rscratch1? c1_LIRAssembler_aarch64.cpp: I found the comment here to be misleading 1581 void LIR_Assembler::casw(Register addr, Register newval, Register cmpval) { 1582 // r8 is used to pass an argument here, not as scratch. See 1583 // LIRGenerator::atomic_cmpxchg. 1584 __ cmpxchg(addr, cmpval, newval, Assembler::word, /* acquire*/ true, /* release*/ true, /* weak*/ false, r8); 1585 __ cset(r8, Assembler::NE); 1586 __ membar(__ AnyAny); 1587 } Firstly, to be more strict/precise, r8 is used to return a result from cmpxchg in order to then pass it on to cset (it read to me like you were saying that it passes a value into cmpxchg). More importantly, what the comment does not clarify is that the reason you cannot allocate r8 as a scratch register here at the point of call with a ScratchRegister decl is because cmpxchg itself conditionally reserves r8 as a scratch register in the case where a client passes noreg. So using a ScratchRegister here would break cmpxchg and not using a ScratchRegister in cmpxchg would disallow other callers from passing in noreg or require it to provide two paths to handle the two cases, noreg or a scratch reg, allocating a ScratchRegister in only one of those paths. This is all really a bit clumsy and certainly unclear. This sort of case where a called method conditionally allocates a scratch register is an important one to be able to handle. I think a way to deal with this might be to allow methods like cmpxchg to declare a local scratch register which /uses/ a supplied register if provided and only /allocates/ a specific scratch register if noreg is supplied. That would allow the caller to allocate a scratch register and pass it in or not allocate the register and pass in noreg. Of course it also allows a caller to pass in a non-scratch register. So, given the above definition and your current definition for cmpxchg, viz: 2497 void MacroAssembler::cmpxchg(Register addr, Register expected, 2498 Register new_val, 2499 enum operand_size size, 2500 bool acquire, bool release, 2501 bool weak, 2502 Register result) { 2503 ScratchRegister rscratch1(as(), r8); 2504 if (result == noreg) result = rscratch1; <blithely use result> we could replace them with void LIR_Assembler::casw(Register addr, Register newval, Register cmpval) { // allocate scratch register to return result // and pass it on to cset ScratchRegister rscratch1( __ as(), r8); __ cmpxchg(addr, cmpval, newval, Assembler::word, /* acquire*/ true, /* release*/ true, /* weak*/ false, rscratch1); __ cset(rscratch1, Assembler::NE); __ membar(__ AnyAny); } and void MacroAssembler::cmpxchg(Register addr, Register expected, Register new_val, enum operand_size size, bool acquire, bool release, bool weak, Register result) { ScratchRegister rscratch1(as(), /*use without alloc if reg*/ result, /*alloc if noreg*/ r8); <blithely use rscratch1 instead of result> This doesn't really work quite right because the caller has to inhibit allocation in the called method either by passing 1) an explicit non-scratch register or 2) an allocated scratch register i.e. it is not quite uniform. However, in cases where noreg is passed the implication is clear that that rscratch1 (possibly also rscratch2 if two noreg args may be passed) will be allocated by the callee as an alternative. n.b. the comment in the original version of LIR_Assembler::casw refers to LIRGenerator::atomic_cmpxchg. Do you not actually mean MacroAssembler::cmpxchg? c1_Runtime_aarch64.cpp I don't really like this: 51 int StubAssembler::call_RT(Register oop_result1, Register metadata_result, address entry, int args_size) { 52 // setup registers . . . 58 FreeScratchRegs dummy(as()); 59 ScratchRegister rscratch1(as(), r8); 60 ScratchRegister rscratch2(as(), r9); Why are the regs freed here rather than in the caller? I realise this case is special because the callee can know that scratch regs are now invalid (since we are about to plant a blr all bets are off). However, this is a disparity with other use cases where the caller needs to make the decision to free temps. Intentions and expectations would be clearer if the caller were required to explicitly release scratch vars before a call to call_RT (and then maybe reallocate them again afterwards). interp_masm_aarch64.cpp: You have these top level declarations: 47 static const Register rscratch1 = r8; 48 static const Register rscratch2 = r9; So, we don't (now or in future) use ScratchRegister here? Why not? Or is this just an expedient hack? Once again, I'd prefer scratch uses to be uniform across all the code. interp_masm_aarch64.hpp: 293 set_last_Java_frame(esp, rfp, (address) pc(), r8); So, here you are using r8 rather than r8_scratch. Was there a rationale for that? interpreterRT_aarch64.cpp: 41 static const Register rscratch1 = r8; 42 static const Register rscratch2 = r9; Same comment as above for interp_masm_aarch64.cpp. macroAssembler_aarch64.cpp: Several more occurrences of callee freeing that make sense given that a call is being planted but which I think would be clearer if done in the relevant callers: 679 void MacroAssembler::call_VM_base(Register oop_result, 680 Register java_thread, 681 Register last_java_sp, 682 address entry_point, 683 int number_of_arguments, 684 bool check_exceptions) { 685 FreeScratchRegs regs(as()); . . . 804 address MacroAssembler::emit_trampoline_stub(int insts_call_instruction_offset, 805 address dest) { . . . 821 FreeScratchRegs dummy(as()); . . . 1456 void MacroAssembler::call_VM_leaf_base(address entry_point, 1457 int number_of_arguments, 1458 Label *retaddr) { 1459 Label E, L; 1460 1461 FreeScratchRegs dummy(as()); // VM calls clobber all registers but 1462 // we preserve rscratch1. 1463 ScratchRegister rscratch1(as(), r8); . . . Also, I don't follow why you sometimes use r8 and other times r9_scratch in this file. methodHandles_aarch64.cpp Once again free in a callee that I'd prefer to see done in the callers: 97 void MethodHandles::jump_from_method_handle(MacroAssembler* _masm, Register method, Register temp, 98 bool for_compiler_entry) { 99 FreeScratchRegs dummy(__ as()); 100 ScratchRegister rscratch1(__ as(), r8); 128 void MethodHandles::jump_to_lambda_form(MacroAssembler* _masm, 129 Register recv, Register method_temp, 130 Register temp2, 131 bool for_compiler_entry) { 132 FreeScratchRegs(__ as()); stubGenerator_aarch64.cpp Two cases where it is clear a more selective release would be useful 2104 { 2105 FreeScratchRegs dummy(__ as()); 2106 ScratchRegister rscratch2(__ as(), r9); . . . 2188 { 2189 FreeScratchRegs dummy(__ as()); 2190 ScratchRegister rscratch2(__ as(), r9); templateTable_aarch64.cpp 46 static const Register rscratch1 = r8; 47 static const Register rscratch2 = r9; Same comment as above for interp_masm_aarch64.cpp.
On 11/5/19 12:24 PM, Andrew Dinn wrote:
I think this is a very good start to remedying a nasty problem. However, I'm not yet convinced the model for how you manage and use scratch registers is the best way to do this. I'll explain why below and suggest a variation that might or might not work. If it doesn't then correct me and that will at least help understand why you ended up with the current design. If it does work or, at least, suggests how to move towards something better then let's try another round.
What troubles me most of all is that the mechanism you have provided can be quite opaque about ownership/liveness of registers. I think that happens because at root it does not require that declaration + allocation, release and subsequent re-allocation are co-located in the same program scope (or,at least, in the same method).
Let's work through some examples to clarify that view. As it stands it is possible to allocate some scratch registers in a caller method and then release those registers in a called method (possibly more than one call down the stack) -- indeed, the latter operation occurs in this code from the patch:
Absolutely so, yes. Note that at this point I am not trying to reorganize code but to make is clear(er) what is going on, and when a programmer does something random to force that programmer to declare that randomness. Abuses are possible, I agree.
c1_LIRAssembler_aarch64.cpp:
960 void LIR_Assembler::mem2reg(LIR_Opr src, LIR_Opr dest, BasicType type, LIR_PatchCode patch_code, CodeEmitInfo* info, bool wide, bool /* unaligned */) { 961 LIR_Address* addr = src->as_address_ptr(); . . . 976 int null_check_here = code_offset(); 977 FreeScratchRegs dummy(__ as()); . . .
ote that we are not in the scope of a visible ScratchRegister declaration here. So, the use fo a FreeScratchRegs declaration implies that we may be in the scope of a declaration up the call chain. If that may be the case then how can the callee safely cancel that allocation? How can it be sure that the caller is not relying on a scratch register retaining its value across the call.
It can't. "Naked" FreeScratchRegs are an abomination to be be avoided wherever possible. The only really justified use of them is when we're making a callout to runtime code, at which point the programmer is expected to know that the native ABI will clobber the scratch regs. I don't want to clutter every single call to the runtime with FreeScratchRegs. I don't think it would help anyone.
Likewise from the POV of the caller which might have declared a ScratchRegister there is nothing in the code (except perhaps for comments) to indicate that said scratch register might be overwritten under the call to this method (or perhaps even a call to an indirect caller). To those who do not know the details of the callee a ScratchRegister declared in the caller will appear to remain valid across the call even though it may be overwritten.
Another example occurs later in the same file
1464 void LIR_Assembler::emit_opTypeCheck(LIR_OpTypeCheck* op) { 1465 const bool should_profile = op->should_profile(); 1466 1467 LIR_Code code = op->code(); 1468 if (code == lir_store_check) { . . . 1495 if (should_profile) { 1496 Label not_null; 1497 __ cbnz(value, not_null); 1498 // Object is null; update MDO and exit 1499 ScratchRegister rscratch1(__ as(), r8); 1500 ScratchRegister rscratch2(__ as(), r9); . . . . . . 1554 } else if (code == lir_checkcast) { 1555 FreeScratchRegs dummy(__ as()); . . . 1564 } else if (code == lir_instanceof) { 1565 FreeScratchRegs dummy(__ as()); . . .
Once again the release operations occur in a scope where there is no prior declaration in the same scope. So, this use of release declarations in the else-if clauses implies that a caller may have allocated a scratch register yet they presume it is safe to release it.
That one looks like a mistake. It's perfectly possible to FreeScratchRegs where none are allocated. I don't believe that any registers will be allocated before the start of emit_opTypeCheck().
Meanwhile, to make matters worse, the ScratchRegister declarations in the initial if clause imply that a caller which exercises this path will /not/ have declared a scratch register. That further implies that the validity of these combined assumptions about what declarations might be in scope depends on the callee's conditional logic. Now that's starting to get a tad too opaque and I fear we are heading down a garden path.
Indeed.
Of course, this may just be down to the fact that the inclusion of those release operations was superfluous; that they could be removed from these examples. However, I don't like the fact that one can construct such examples. I would much prefer it if this usage was avoided where possible by requiring the release to occur in the scope where the scratch register was declared. i.e. the caller must arrange to free scratch registers if a caller might need them. That would make it much easier to detect circumstances where the caller was trying to have its cake and eat it.
Sure, I'm happy with that in all cases except in the special case of the runtime calls.
This model of usage is indeed what happens much of the time e.g. also from the same file:
1280 void LIR_Assembler::type_profile_helper(Register mdo, 1281 ciMethodData *md, ciProfileData *data, 1282 Register recv, Label* update_done) { 1283 ScratchRegister rscratch1(__ as(), r8); 1284 ScratchRegister rscratch2(__ as(), r9); . . . 1293 { 1294 FreeScratchRegs dummy(__ as()); . . . 1298 } . . .
With this usage it is very clear that any scratch value established before line 1294 cannot be relied on in code after line 1298.
That's what it's supposed to look like! I don't really believe that it makes sense for us to prevent "Naked" FreeScratchRegs automagically, but it would be good to reject such things in code review
I suspect that there are circumstances where a caller can or even must free scratch registers down the call tree from a allocation so perhaps we cannot afford always to dispense with this current usage for FreeScratchRegs altogether. However, I'd prefer it if an explicit scope-local release was used for wherever possible to avoid this 2nd-guessing of callers' intentions. In which case, it would be much better if we could use the grammar of language definitions to support that .. .
Now that's a good idea. I'm not sure how you'd actually enforce it in C++, but you could have something like CalloutFreeScratchRegs for callouts. Actually, maybe I *can* think of a way to do it with macro magic.
. . . which also provides the opportunity to remedy another shortcoming. The current release operation does not specify which scratch register(s) is (are) to be released. The only option is free all registers. However, if a release operation must always be located in the scope of a ScratchRegister then it is perfectly possible to do so by passing the relevant ScratchRegister as a parameter. So, perhaps we could use macros like:
FreeScratchRegister dummy(rscratch1)
FreeScratchRegisters dummy(rscratch1, rscratch2);
Sure, it is, but IMO it's an over-elaboration. When calling down to a sub-macro it's easer to follow what's going on if you just say "this macro clobbers scratch", but I won't fight you if you're really keen to do that.
That makes it harder for scratch registers to be freed down the stack in a caller because the caller will not have access to rscratch1 and rscratch2 unless they are passed in by reerence or pointer. I'm assuming code won't pass a ScratchRegister as a call argument, merely the encapsulated Register (we might perhaps be able to knobble that by ensuring that copying/dereferencing reset the register to noreg).
Yeah. Passing scratch registers by other names is one of the worst abuses I've had to deal with, and after some version of this patch goes in such things may be fixed.
If this revised model works then the current FreeScratchRegs really needs renaming to something less appealing like DeleteCallerScratchBindings that won't tempt anyone to use it.
That's a nice idea.
Also, it would perhaps be useful to provide a reverse operation for re-allocating a previouly declared and freed scratch register e.g.
public void Foo::bar(...) ScratchRegister rscratch1(...); ScratchRegister rscratch2(...); . . . <use rscratch1 and rscratch2> . . . FreeScratchRegister dummy(rscratch1); call_user_of_rscratch1(...); UseScratchRegister dummy2(rscratch1); ...
No: too complicated, excessive API service. Using nested scopes for allocation and release corresponds will with the usage of C++, and that's a good thing.
This explicitly acknowledges that the call may want to use rsctratch1 and explicitly signals in the caller code that rscratch1 cannot be expected to remain constant across the call. Of course, one can always achieve the same effect with block nesting but I think this is neater.
I don't. That's what nesting is for!
There is a bit more to this to do with conditional allocation of a scratch register that I will discuss in the detailed comments (for c1_LIRAssembler) but that needs to be done in context so ...
... on to the specific comments:
aarch64.ad:
Firstly, just a humdrum error. You missed a conversion for a specific use of rscratch1
2851 enc_class aarch64_enc_stlrb(iRegI src, memory mem) %{ 2852 MOV_VOLATILE(as_Register($src$$reg), $mem$$base, $mem$$index, $mem$$scale, $mem$$disp, 2853 rscratch1, stlrb); 2854 %}
This is still passing rscratch1 in the MOV_VOLATILE macro call where subsequent encoding classes use r8_scratch. It doesn't actually matter (i.e. it doesn't fail to compile) because MOV_VOLATILE ignores the SCRATCH argument. I know you are trying to minimize changes but dropping that argument in all uses would be a whole lot better . . . yes, maybe in the next patch.
Yes.
This story continues . . .
2930 enc_class aarch64_enc_fldars(vRegF dst, memory mem) %{ 2931 MOV_VOLATILE(r8_scratch, $mem$$base, $mem$$index, $mem$$scale, $mem$$disp, 2932 r8_scratch, ldarw); 2933 __ fmovs(as_FloatRegister($dst$$reg), r8_scratch); 2934 %}
I'm not sure why you invented the name r8_scratch as an alias for r8. Does that really help? I couldn't discern why you sometimes used one and why sometimes the other. If there is a rationale it probaly ought to be documented somewhere (at least at the point where r8_scratch aand r9_scratch are declared).
I'm not sure, which is perhaps why it wasn't used consistently. Just a reminder, I guess.
Anyway, what I don't follow is why are these uses just employ a bare register name. Why are they exempt from the need to declare a scratch register for r8 before using it? Why not:
enc_class aarch64_enc_fldars(vRegF dst, memory mem) %{ ScratchRegister rscratch1(__ as(), r8); MOV_VOLATILE(rscratch1, $mem$$base, $mem$$index, $mem$$scale, $mem$$disp, rscratch1, ldarw); __ fmovs(as_FloatRegister($dst$$reg), rscratch1); . . .
or, if need be
enc_class aarch64_enc_fldars(vRegF dst, memory mem) %{ { ScratchRegister rscratch1(__ as(), r8); MOV_VOLATILE(rscratch1, $mem$$base, $mem$$index, $mem$$scale, $mem$$disp, rscratch1, ldarw); __ fmovs(as_FloatRegister($dst$$reg), rscratch1); } . . .
I guess I could live with that. I don't want to introduce extra noise where it can be avoided.
This is not the only place where your usage is confusing. You do (although not consistently) use declarations in some of the later encodings. For example, further down the file:
3526 enc_class aarch64_enc_fast_lock(iRegP object, iRegP box, iRegP tmp, iRegP tmp2) %{ . . . 3570 // markWord of object (disp_hdr) with the stack pointer. 3571 __ mov(r8_scratch, sp); 3572 __ sub(disp_hdr, disp_hdr, r8_scratch); . . .
has no declaration ... but is immediately followed by
3604 enc_class aarch64_enc_fast_unlock(iRegP object, iRegP box, iRegP tmp, iRegP tmp2) %{ . . . 3644 ScratchRegister rscratch1(__ as(), r8); . . .
Fair enough. It's PoC, WIP. :-)
Declarations are also used in some of the instruction definitions:
12103 instruct rolI_rReg(iRegINoSp dst, iRegI src, iRegI shift, rFlagsReg cr) . . . 12109 ins_encode %{ 12110 ScratchRegister rscratch1(__ as(), r8); 12111 __ subw(rscratch1, zr, as_Register($shift$$reg)); . . .
Is there a clear rationale for whether or not to declare a ScratchRegister that I am missing? I'd be happier with them being used everywhere, avoiding explicit mention of Register names at points of use.
No, I might have just changed my mind partway through. In principle, for the sake of the sanity of the maintenance programmer, it might be simplest to insist that scratch register declarations are always used in AD files. It would mean a change which moves the register tracker from Assembler to CodeBuffer, because C2 creates Assemblers on the fly, but such a change is not hard to do.
Also, one other thing I don't understand but I think is just an error:
3008 enc_class aarch64_enc_stlxr(iRegLNoSp src, memory mem) %{ 3009 MacroAssembler _masm(&cbuf); 3010 Register src_reg = as_Register($src$$reg); 3011 Register base = as_Register($mem$$base); 3012 ScratchRegister rscratch2(__ as(), r9); . . . 3017 if (disp != 0) { 3018 __ lea(r9_scratch, Address(base, disp)); 3019 __ stlxr(r8_scratch, src_reg, r9_scratch); . . .
Why are you declaring r9 as temp rscratch2, not declaring r8 as rscratch1 and then using both as r9_scratch and r8_scratch? Note also that in the previous encoding (aarch64_enc_ldaxr) you use r9_scratch and r8_scratch and don't declare any ScratchRegister.
I'm sure there was a reason, but...
Why name the scratch register r8_scratch in this decl (shadowing an existing name) rather than calling it rscratch1?
That too.
c1_LIRAssembler_aarch64.cpp:
I found the comment here to be misleading
1581 void LIR_Assembler::casw(Register addr, Register newval, Register cmpval) { 1582 // r8 is used to pass an argument here, not as scratch. See 1583 // LIRGenerator::atomic_cmpxchg. 1584 __ cmpxchg(addr, cmpval, newval, Assembler::word, /* acquire*/ true, /* release*/ true, /* weak*/ false, r8); 1585 __ cset(r8, Assembler::NE); 1586 __ membar(__ AnyAny); 1587 }
Firstly, to be more strict/precise, r8 is used to return a result from cmpxchg in order to then pass it on to cset (it read to me like you were saying that it passes a value into cmpxchg).
Passing scratch registers to macros as arguments is really awkward. It's just about the most confusing and error-prone thing you can do. In a later patch I'd like to get rid of all of it. Probably the greatest contribution of this work is that it detects and forces us to do something about all such usages.
More importantly, what the comment does not clarify is that the reason you cannot allocate r8 as a scratch register here at the point of call with a ScratchRegister decl is because cmpxchg itself conditionally reserves r8 as a scratch register in the case where a client passes noreg. So using a ScratchRegister here would break cmpxchg and not using a ScratchRegister in cmpxchg would disallow other callers from passing in noreg or require it to provide two paths to handle the two cases, noreg or a scratch reg, allocating a ScratchRegister in only one of those paths. This is all really a bit clumsy and certainly unclear.
Right. The problem here is that the way the registers are used is confusing and clumsy; the declarations (and comments) reflect that clumsiness.
This sort of case where a called method conditionally allocates a scratch register is an important one to be able to handle. I think a way to deal with this might be to allow methods like cmpxchg to declare a local scratch register which /uses/ a supplied register if provided and only /allocates/ a specific scratch register if noreg is supplied. That would allow the caller to allocate a scratch register and pass it in or not allocate the register and pass in noreg. Of course it also allows a caller to pass in a non-scratch register.
Better still, I think, for the caller to allocate the scratch register and pass it down under the name rscratch1; ownership remains with the caller.
So, given the above definition and your current definition for cmpxchg, viz:
2497 void MacroAssembler::cmpxchg(Register addr, Register expected, 2498 Register new_val, 2499 enum operand_size size, 2500 bool acquire, bool release, 2501 bool weak, 2502 Register result) { 2503 ScratchRegister rscratch1(as(), r8); 2504 if (result == noreg) result = rscratch1; <blithely use result>
we could replace them with
void LIR_Assembler::casw(Register addr, Register newval, Register cmpval) { // allocate scratch register to return result // and pass it on to cset ScratchRegister rscratch1( __ as(), r8); __ cmpxchg(addr, cmpval, newval, Assembler::word, /* acquire*/ true, /* release*/ true, /* weak*/ false, rscratch1); __ cset(rscratch1, Assembler::NE); __ membar(__ AnyAny); }
and
void MacroAssembler::cmpxchg(Register addr, Register expected, Register new_val, enum operand_size size, bool acquire, bool release, bool weak, Register result) { ScratchRegister rscratch1(as(), /*use without alloc if reg*/ result, /*alloc if noreg*/ r8); <blithely use rscratch1 instead of result>
This doesn't really work quite right because the caller has to inhibit allocation in the called method either by passing 1) an explicit non-scratch register or 2) an allocated scratch register i.e. it is not quite uniform. However, in cases where noreg is passed the implication is clear that that rscratch1 (possibly also rscratch2 if two noreg args may be passed) will be allocated by the callee as an alternative.
Hmm, maybe. I suggest that we put up with the ugliness of nakedly using r8 for now (accompanied by suitable comments) and then fix it up later.
n.b. the comment in the original version of LIR_Assembler::casw refers to LIRGenerator::atomic_cmpxchg. Do you not actually mean MacroAssembler::cmpxchg?
I can't remember.
c1_Runtime_aarch64.cpp
I don't really like this:
51 int StubAssembler::call_RT(Register oop_result1, Register metadata_result, address entry, int args_size) { 52 // setup registers . . . 58 FreeScratchRegs dummy(as()); 59 ScratchRegister rscratch1(as(), r8); 60 ScratchRegister rscratch2(as(), r9);
Why are the regs freed here rather than in the caller? I realise this case is special because the callee can know that scratch regs are now invalid (since we are about to plant a blr all bets are off). However, this is a disparity with other use cases where the caller needs to make the decision to free temps. Intentions and expectations would be clearer if the caller were required to explicitly release scratch vars before a call to call_RT (and then maybe reallocate them again afterwards).
No, I don't want to do that, as I said above. It's a callout, which nukes more than just the scratch registers.
interp_masm_aarch64.cpp:
You have these top level declarations:
47 static const Register rscratch1 = r8; 48 static const Register rscratch2 = r9;
So, we don't (now or in future) use ScratchRegister here? Why not? Or is this just an expedient hack? Once again, I'd prefer scratch uses to be uniform across all the code.
It's very expedient, and in particular the interpreter has its own convention for register usage. I don't believe that rewriting all of the templates etc. to use scratch register declarations would reduce errors but it might even introduce them. I could live with the interpreter being changed later, but there's no need for it now.
interp_masm_aarch64.hpp:
293 set_last_Java_frame(esp, rfp, (address) pc(), r8);
So, here you are using r8 rather than r8_scratch. Was there a rationale for that?
It's because of declaration is in the wrong place. If we moved it, it could just use the same name as the rest of the interpreter.
interpreterRT_aarch64.cpp:
41 static const Register rscratch1 = r8; 42 static const Register rscratch2 = r9;
Same comment as above for interp_masm_aarch64.cpp.
macroAssembler_aarch64.cpp:
Several more occurrences of callee freeing that make sense given that a call is being planted but which I think would be clearer if done in the relevant callers:
See above.
Also, I don't follow why you sometimes use r8 and other times r9_scratch in this file.
Some cleanups required.
methodHandles_aarch64.cpp
Once again free in a callee that I'd prefer to see done in the callers:
97 void MethodHandles::jump_from_method_handle(MacroAssembler* _masm, Register method, Register temp, 98 bool for_compiler_entry) { 99 FreeScratchRegs dummy(__ as()); 100 ScratchRegister rscratch1(__ as(), r8);
There's even less justification for putting FreeScratchRegs in the caller at this point: jump_from_method_handle is not going to return.
stubGenerator_aarch64.cpp
Two cases where it is clear a more selective release would be useful
2104 { 2105 FreeScratchRegs dummy(__ as()); 2106 ScratchRegister rscratch2(__ as(), r9); . . . 2188 { 2189 FreeScratchRegs dummy(__ as()); 2190 ScratchRegister rscratch2(__ as(), r9);
Maybe, but it would make for a more complicated API, which I'm not convinced would really carry its weight. I can go with a more complicated FreeScratchRegs which says what you've freed rather than what you've reserved if you're really keen. NBut it's not obvious to me that FreeScratchRegs dummy(r8, __ as()); is really better than FreeScratchRegs dummy(__ as()); ScratchRegister rscratch2(__ as(), r9); ... especially since the latter actually relies on the programmer reading upwards to see that r9 is allocated.
templateTable_aarch64.cpp
46 static const Register rscratch1 = r8; 47 static const Register rscratch2 = r9;
Same comment as above for interp_masm_aarch64.cpp.
Same reply. :-) Thanks. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. <https://www.redhat.com> https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671
On 11/5/19 2:58 PM, Andrew Haley wrote:
No: too complicated, excessive API service.
s/service/surface/ -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. <https://www.redhat.com> https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671
One other thing: this exercise has shown that in many cases we trash scratch registers in places where it really doesn't matter, and we'd be much better off rewriting them not to do so. This makes push_call_clobbered_registers() something that can safely be used everywhere. But I'm holding off any of this because I want the first patch to be, if at all possible, neutral with regard to code generated. diff -r 33f9271b3167 src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp --- a/src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp Mon Nov 04 13:13:34 2019 -0500 +++ b/src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp Wed Nov 06 08:36:08 2019 -0500 @@ -2624,15 +2624,17 @@ int step = 4 * wordSize; push(RegSet::range(r0, r18) - RegSet::of(rscratch1, rscratch2), sp); sub(sp, sp, step); - mov(rscratch1, -step); + mov(r0, -step); // Push v0-v7, v16-v31. for (int i = 31; i>= 4; i -= 4) { if (i <= v7->encoding() || i >= v16->encoding()) st1(as_FloatRegister(i-3), as_FloatRegister(i-2), as_FloatRegister(i-1), - as_FloatRegister(i), T1D, Address(post(sp, rscratch1))); + as_FloatRegister(i), T1D, Address(post(sp, r0))); } st1(as_FloatRegister(0), as_FloatRegister(1), as_FloatRegister(2), as_FloatRegister(3), T1D, Address(sp)); + // Reload r0 from where it was saved before pushing v0-v7, v16-v31. + ldr(r0, Address(sp, (8 + 16) * wordSize)); } -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. <https://www.redhat.com> https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671
On 06/11/2019 13:43, Andrew Haley wrote:
One other thing: this exercise has shown that in many cases we trash scratch registers in places where it really doesn't matter, and we'd be much better off rewriting them not to do so.
Agreed.
This makes push_call_clobbered_registers() something that can safely be used everywhere. But I'm holding off any of this because I want the first patch to be, if at all possible, neutral with regard to code generated. . . . Yes, that patch is an improvement.
regards, Andrew Dinn -----------
participants (3)
-
Andrew Dinn
-
Andrew Haley
-
Nick Gasson