premain: Possible solutions to use runtime G1 region grain size in AOT code (JDK-8335440)
Andrew Dinn
adinn at redhat.com
Thu Jul 18 11:00:26 UTC 2024
On 17/07/2024 18:27, Vladimir Kozlov wrote:
> > We don't have such a reloc at present..
>
> What about section_word_Relocation so we can put grain value into
> constants section?
I agree that when compiling an nmethod we would need to use a
section_word_type reloc to mark the adrp that accesses the constant.
That would ensure that the offset used by the adrp is kept consistent
across buffer resizes and at install when the displacement may change.
However, what I was talking about was a new reloc, needed only when the
SCCache restores code, which would mark the constant itself. When AOT
code is restored we need to ensure any such constant is rewritten using
the runtime grain size.
We could attempt to do the rewrite of the constant as a side-effect of
processing the section_word_type reloc during code restore. However, we
would need to know for sure that the constant being accessed by the adrp
was definitely the grain size. Is that what you were thinking of, Vladimir?
Of course that would not work for stubs which need to include a barrier
and a reference to the barrier shift (I believe this only applies for
some of the memory copy stubs). In this case we would have to load the
constant from a data slot allocated in amongst the instructions. So, we
I think would not be able to identify the location of the constant with
a section_word_type reloc.
regards,
Andrew Dinn
-----------
> On 7/17/24 3:15 AM, Andrew Dinn wrote:
>> Hi Ioi,
>>
>> On 16/07/2024 17:33, ioi.lam at oracle.com wrote:
>>>
>>> On 7/15/24 9:23 AM, Andrew Dinn wrote:
>>>> . . .
>>>> The second solution modifies barrier generation when the SCCache is
>>>> open for writing to load the shift count from a runtime field,
>>>> G1HeapRegion::LogHRGrainSize i.e. the same field that determines the
>>>> immediate count used for normal generation. In order to make this
>>>> visible to the compilers and SCC address table the address of this
>>>> field is exported via the card table. This solution requires the AOT
>>>> code to reference the target address using a runtime address
>>>> relocation. Once again, if the SCCache is not open for writing the
>>>> count is generated as normal i.e. as an immediate operand.
>>>>
>>>>
>>> Is the G1HeapRegion::LogHRGrainSize loaded with PC offset?
>>>
>>> ldr grain, [pc, #5678]
>>
>> That's not what this option does. The barroer loads the grain size
>> indirectly via a constant static field address, i.e. via address
>> &G1HeapRegion::LogHRGrainSize (well, actually, the constant is
>> determined by whatever address is reported by the barrier card table
>> but effectively it is &G1HeapRegion::LogHRGrainSize). So the barrier
>> includes uses a sequence like this
>>
>> movz reg #<16bit>
>> movk reg #<16bit>, #16
>> movk reg #<16bit>, #32
>> ldrb reg, reg
>> . . .
>> lsr reg2, reg, reg2
>>
>> The 16 bit quantities compose to the address of the field. The 3 x mov
>> sequence is marked with a runtime relocation which ensures that it is
>> updated when generated code is restored from the SCCache. That
>> requires the field address to be inserted in the SCC address table's
>> list of external addresses.
>>
>> This scheme requires repeating that series of 3 x mov + ldrb
>> instructions at every object field store in a given compiled method.
>> That also implies a runtime relocation for each such sequence when the
>> code is restored from the SCCache.
>>
>> With C2 the barrier manifests as a (Set dst con) for a special ConP
>> value (operand con has type immRegionGrainShift) feeding a LoadB. I
>> guess C2 might conceivably be able to optimize away some of the repeat
>> movz/k and ldrb sequences if it is able to keep the address or byte
>> value in a register or spill slot but I would not expect that to be
>> likely.
>>
>>> I suppose this require us to put multiple copies of
>>> G1HeapRegion::LogHRGrainSize inside the AOT code, as there's a limit
>>> for the offset. But we will be patching fewer places than every sites
>>> that needs to know the grain size.
>> I think what you are suggesting here is what I described as option 4.
>> i.e. we put the grain size in the nmethod const section (or in a
>> dedicated data location for a non-nmethod blob) and insert a
>> pc-relative load in the barrier to feed the lsr.
>>
>> With AOT code this would require a special relocation to mark the
>> constants area slot (or the non-method blob data slot), lets call it
>> reloc_grain_shift_const. It would patch the constant to whatever value
>> field G1HeapRegion::LogHRGrainSize has in the current runtime (or
>> rather to whatever grain size is reported by the barrier card table).
>> We don't have such a reloc at present.. We do have an existing reloc
>> for a runtime data address which is why I implemented option 2 first
>> (to work out where I would need to tweak the compilers and barrier set
>> assemblers plus auxiliary classes).
>>
>> With option 4 I believe we will only need one occurrence of the
>> constant. On AArch64 we would use either adr or adrp + add to install
>> a pc-relative address into a register and then an ldrb via that register.
>>
>> adr reg, #<21bits>
>> ldrb reg, reg
>> ...
>> lsr reg2, reg, reg2
>>
>> or
>>
>> adrp reg, #<21bits> # selects 12 bit-aligned page
>> add reg, #<12bits>
>> ldrb reg, reg
>> ...
>> lsr reg2, reg, reg2
>>
>> The adr/adrp instructions do not need relocating which is why scheme 4
>> would only require 1 relocation per nmethod (or non-nmethod blob).
>>
>> Option 3 involves generating the normal barrier
>>
>> lsr, reg, #imm, reg
>>
>> The difference is that for AOT code we would mark the instruction with
>> a new relocation, let's call it reloc_grain_shift_immediate. Patching
>> for this reloc would assert that the corresponding instruction is an
>> shift and that the current GC barrier set is using a card table. It
>> would update the immediate operand with whatever grain size shift was
>> reported by the card table.
>>
>> Like scheme 2 this would require a reloc for every object field write
>> in an nmethod (or non-nmethod blob).
>>
>> regards,
>>
>>
>> Andrew Dinn
>> -----------
>>
>
--
regards,
Andrew Dinn
-----------
Red Hat Distinguished Engineer
Red Hat UK Ltd
Registered in England and Wales under Company Registration No. 03798903
Directors: Michael Cunningham, Michael ("Mike") O'Neill
More information about the leyden-dev
mailing list