premain: Possible solutions to use runtime G1 region grain size in AOT code (JDK-8335440)
Vladimir Kozlov
vladimir.kozlov at oracle.com
Wed Jul 17 17:27:00 UTC 2024
> We don't have such a reloc at present..
What about section_word_Relocation so we can put grain value into constants section?
Thanks,
Vladimir K
On 7/17/24 3:15 AM, Andrew Dinn wrote:
> Hi Ioi,
>
> On 16/07/2024 17:33, ioi.lam at oracle.com wrote:
>>
>> On 7/15/24 9:23 AM, Andrew Dinn wrote:
>>> . . .
>>> The second solution modifies barrier generation when the SCCache is open for writing to load the shift count from a
>>> runtime field, G1HeapRegion::LogHRGrainSize i.e. the same field that determines the immediate count used for normal
>>> generation. In order to make this visible to the compilers and SCC address table the address of this field is
>>> exported via the card table. This solution requires the AOT code to reference the target address using a runtime
>>> address relocation. Once again, if the SCCache is not open for writing the count is generated as normal i.e. as an
>>> immediate operand.
>>>
>>>
>> Is the G1HeapRegion::LogHRGrainSize loaded with PC offset?
>>
>> ldr grain, [pc, #5678]
>
> That's not what this option does. The barroer loads the grain size indirectly via a constant static field address, i.e.
> via address &G1HeapRegion::LogHRGrainSize (well, actually, the constant is determined by whatever address is reported by
> the barrier card table but effectively it is &G1HeapRegion::LogHRGrainSize). So the barrier includes uses a sequence
> like this
>
> movz reg #<16bit>
> movk reg #<16bit>, #16
> movk reg #<16bit>, #32
> ldrb reg, reg
> . . .
> lsr reg2, reg, reg2
>
> The 16 bit quantities compose to the address of the field. The 3 x mov sequence is marked with a runtime relocation
> which ensures that it is updated when generated code is restored from the SCCache. That requires the field address to be
> inserted in the SCC address table's list of external addresses.
>
> This scheme requires repeating that series of 3 x mov + ldrb instructions at every object field store in a given
> compiled method. That also implies a runtime relocation for each such sequence when the code is restored from the SCCache.
>
> With C2 the barrier manifests as a (Set dst con) for a special ConP value (operand con has type immRegionGrainShift)
> feeding a LoadB. I guess C2 might conceivably be able to optimize away some of the repeat movz/k and ldrb sequences if
> it is able to keep the address or byte value in a register or spill slot but I would not expect that to be likely.
>
>> I suppose this require us to put multiple copies of G1HeapRegion::LogHRGrainSize inside the AOT code, as there's a
>> limit for the offset. But we will be patching fewer places than every sites that needs to know the grain size.
> I think what you are suggesting here is what I described as option 4. i.e. we put the grain size in the nmethod const
> section (or in a dedicated data location for a non-nmethod blob) and insert a pc-relative load in the barrier to feed
> the lsr.
>
> With AOT code this would require a special relocation to mark the constants area slot (or the non-method blob data
> slot), lets call it reloc_grain_shift_const. It would patch the constant to whatever value field
> G1HeapRegion::LogHRGrainSize has in the current runtime (or rather to whatever grain size is reported by the barrier
> card table). We don't have such a reloc at present.. We do have an existing reloc for a runtime data address which is
> why I implemented option 2 first (to work out where I would need to tweak the compilers and barrier set assemblers plus
> auxiliary classes).
>
> With option 4 I believe we will only need one occurrence of the constant. On AArch64 we would use either adr or adrp +
> add to install a pc-relative address into a register and then an ldrb via that register.
>
> adr reg, #<21bits>
> ldrb reg, reg
> ...
> lsr reg2, reg, reg2
>
> or
>
> adrp reg, #<21bits> # selects 12 bit-aligned page
> add reg, #<12bits>
> ldrb reg, reg
> ...
> lsr reg2, reg, reg2
>
> The adr/adrp instructions do not need relocating which is why scheme 4 would only require 1 relocation per nmethod (or
> non-nmethod blob).
>
> Option 3 involves generating the normal barrier
>
> lsr, reg, #imm, reg
>
> The difference is that for AOT code we would mark the instruction with a new relocation, let's call it
> reloc_grain_shift_immediate. Patching for this reloc would assert that the corresponding instruction is an shift and
> that the current GC barrier set is using a card table. It would update the immediate operand with whatever grain size
> shift was reported by the card table.
>
> Like scheme 2 this would require a reloc for every object field write in an nmethod (or non-nmethod blob).
>
> regards,
>
>
> Andrew Dinn
> -----------
>
More information about the leyden-dev
mailing list