[External] : Re: premain: Possible solutions to use runtime G1 region grain size in AOT code (JDK-8335440)

Tue Jul 23 16:12:56 UTC 2024

Hi Vladimir,

On 18/07/2024 17:15, Vladimir Kozlov wrote:
> What about allocating word in CodeCache as we do for some intrinsics 
> stubs tables? You will need to generate it only once and can use 
> runtime_type relocation to access it.

I am looking into that now. I've been working on something else  that 
might interest you ...

> It is all about loading with existing relocation vs specialized 
> relocation for immediate value (Option three).
> I would like to see how complex option three is.
I have an implementation of option 3 in my JDK-8335440-aot-reloc branch. 
m.b. it is based on a slightly out of date premain but the 
implementation indicates what is involved even without a rebase (I'll do 
that soon):

https://github.com/openjdk/leyden/compare/premain...adinn:leyden:JDK-8335440-aot-reloc?expand=1

Basically this solution emits an aot_reloc for the GC barrier left shift 
immediate instruction when we are generating AOT code (StoreCachedCode 
== true). When loading AOT code (LoadCachedCode == true) any left shift 
immediate tagged with an aot_reloc has its operand patched with the log 
region grain size from the current JVM. I use a format field to identify 
what reloc is required so the same model will support other AOT Load 
time relocs if need be.

The implementation makes it fairly obvious that we could use the same 
technique of tagging with an aot_reloc at StoreCachedCode time and 
patching at LoadCachedCode time for any other instruction (or 
indivisible instruction sequence) which 1) encodes some runtime constant 
from the original JVM and 2) is able to be reset by patching it to use 
the value in the new JVM.

It is worth noting that for both C1 and C2 generated code I had to set a 
tag on the LIR node (c1_LIROp in C1, URShiftNode in C2) in order to mark 
it as a relocatable instruction. Later on, in the generation phase, I 
detect the mark and emit a reloc to the code buffer. This works because 
none of the LIR transforms modify the left shift node by merging it into 
some other operation or by merging another operation into it (I checked 
but this is a contingent fact based on the current state of the code).

Clearly, in the case of barrier patching we could finesse the above 
problem by generating the GC barrier late enough to bypass graph 
normalization and back end reductions. However, if we want to use a 
similar technique to AOT patch other instructions or sequences then we 
will need a more reliable way of ensuring that the relocatable 
instructions are not replaced or merged during normalization/final 
reduction.

regards,

Andrew Dinn
-----------