[External] : Re: premain: Possible solutions to use runtime G1 region grain size in AOT code (JDK-8335440)

Thu Aug 15 10:18:06 UTC 2024

On 15/08/2024 00:50, Vladimir Kozlov wrote:
> Do we really need AOTRuntimeConstants instance be a stub in CodeCache?
> Looking on this code and I think this complicate implementation if use 
> ExternalAddress relocation anyway. Can it be static instance in 
> SCCache.cpp which we initialize the same way as in current changes?

Yes, you are right: we don't need the AOTRuntimeConstants struct to be 
embedded in the cache if we are using an ExternalRelocation. The initial 
idea for doing that was because it provides a 32-bit bound on the offset 
from any struct field to a corresponding field load (lea). That would 
allow us to use PC-relative loads of the field addresses -- which saves 
instruction footprint on AArch64, possibly also on x86? However, we can 
only do that if we also implement a new relocation (which seems 
unnecessary).

So, we will redefine the AOTRuntimeConstants struct as a C++ static field.

> I do not understand why you have concern about having separate 
> relocation for each constant in AOTRuntimeConstants. We have only 2 now. 
> Even they grow I don't think we will have a lot of them. Actually the 
> question: will it simplify code if each constant's address is treated as 
> separate and not offset in AOTRuntimeConstants? Instead of loading 
> AOTRuntimeConstants address and then adding offset we just load address 
> of constant.

Yes, you are right again: if we register the address of every field in 
the struct in the SCC address table then that will simplify the back end 
code. I was not sure how much we wanted to rely on address table 
registration but given that we *are* reliant on it just now then we 
might as well have the benefit.

So, we will do the extra registration and limit the back end changes to 
detect ConP occurrences in the AOTRuntimeConstants range and load them 
using an ExternalAddress reloc.

> I don't understand question about C1 (I am not familiar with it). What 
> it generates for shifts now? Why you need 3-operand shift?

I'm not sure '3-operand' is the right term to use to describe what is at 
stake here. It's really about the kind of operand rather than their 
number. The current region/card shift operations employ an immediate 
shift count. The AOT region/card shift operations need to be fed the 
shift count via a register (the target of the constant load).

C1 does currently generate shifts where the shift count is in a register 
but it only does so using the flavours of shift (sar/sal/shr/shl) that 
require the register to be ECX. It doesn't currently generate the 
alternative instructions (sarx/shlx/shrx) that accept a shift count in 
any register. I'm not 100% sure why but I think this is because not all 
CPUs implement them (they require CPUID feature BM2) or it may be to do 
with them setting/not setting flags.

Ashu is proposing adding support for these latter instructions to C1 so 
the barrier code does not have to pin the shift count to ECX. Do we 
care? Or can we live with the effect of pinning?

> Code style: I noticed and aarch64.ad still using old stile of 
> instructions definition using `enc_class`. We don't do this any more (at 
> least in x86*.ad) when we adding new instructions with one 
> implementation. Please use `ins_encode %{ %}` in such cases.

Sure, we will fix that.

> I think, if you need additional register you should push/pop to 
> guarantee we don't stomp over some used value.
Ok, we will play safe and do that.

One final comment: Much of this discussion is going to be side-stepped 
by the G1 late barrier implementation that is currently working its way 
into mainline. That will remove the need to inject barrier IR subgraphs 
into the C1 and C2 IR graphs and eliminate most (all?) of the associated 
C2 reduction rules from the ad files. I think we should still update the 
patch to accommodate the feedback above. We can simplify it further once 
the G1 changes come in from mainline and pursue in parallel the idea of 
accommodating changes to oop encodings using this model.

regards,

Andrew Dinn
-----------