[External] : Re: premain: Possible solutions to use runtime G1 region grain size in AOT code (JDK-8335440)
Vladimir Kozlov
vladimir.kozlov at oracle.com
Thu Jul 18 16:15:51 UTC 2024
On 7/18/24 4:00 AM, Andrew Dinn wrote:
> On 17/07/2024 18:27, Vladimir Kozlov wrote:
>> > We don't have such a reloc at present..
>>
>> What about section_word_Relocation so we can put grain value into constants section?
>
> I agree that when compiling an nmethod we would need to use a section_word_type reloc to mark the adrp that accesses the
> constant. That would ensure that the offset used by the adrp is kept consistent across buffer resizes and at install
> when the displacement may change.
>
> However, what I was talking about was a new reloc, needed only when the SCCache restores code, which would mark the
> constant itself. When AOT code is restored we need to ensure any such constant is rewritten using the runtime grain size.
>
> We could attempt to do the rewrite of the constant as a side-effect of processing the section_word_type reloc during
> code restore. However, we would need to know for sure that the constant being accessed by the adrp was definitely the
> grain size. Is that what you were thinking of, Vladimir?
>
> Of course that would not work for stubs which need to include a barrier and a reference to the barrier shift (I believe
> this only applies for some of the memory copy stubs). In this case we would have to load the constant from a data slot
> allocated in amongst the instructions. So, we I think would not be able to identify the location of the constant with a
> section_word_type reloc.
Yes, you are right, section_word_type will not work.
What about allocating word in CodeCache as we do for some intrinsics stubs tables? You will need to generate it only
once and can use runtime_type relocation to access it.
It is all about loading with existing relocation vs specialized relocation for immediate value (Option three).
I would like to see how complex option three is.
Thanks,
Vladimir K
>
> regards,
>
>
> Andrew Dinn
> -----------
>
>> On 7/17/24 3:15 AM, Andrew Dinn wrote:
>>> Hi Ioi,
>>>
>>> On 16/07/2024 17:33, ioi.lam at oracle.com wrote:
>>>>
>>>> On 7/15/24 9:23 AM, Andrew Dinn wrote:
>>>>> . . .
>>>>> The second solution modifies barrier generation when the SCCache is open for writing to load the shift count from a
>>>>> runtime field, G1HeapRegion::LogHRGrainSize i.e. the same field that determines the immediate count used for normal
>>>>> generation. In order to make this visible to the compilers and SCC address table the address of this field is
>>>>> exported via the card table. This solution requires the AOT code to reference the target address using a runtime
>>>>> address relocation. Once again, if the SCCache is not open for writing the count is generated as normal i.e. as an
>>>>> immediate operand.
>>>>>
>>>>>
>>>> Is the G1HeapRegion::LogHRGrainSize loaded with PC offset?
>>>>
>>>> ldr grain, [pc, #5678]
>>>
>>> That's not what this option does. The barroer loads the grain size indirectly via a constant static field address,
>>> i.e. via address &G1HeapRegion::LogHRGrainSize (well, actually, the constant is determined by whatever address is
>>> reported by the barrier card table but effectively it is &G1HeapRegion::LogHRGrainSize). So the barrier includes uses
>>> a sequence like this
>>>
>>> movz reg #<16bit>
>>> movk reg #<16bit>, #16
>>> movk reg #<16bit>, #32
>>> ldrb reg, reg
>>> . . .
>>> lsr reg2, reg, reg2
>>>
>>> The 16 bit quantities compose to the address of the field. The 3 x mov sequence is marked with a runtime relocation
>>> which ensures that it is updated when generated code is restored from the SCCache. That requires the field address to
>>> be inserted in the SCC address table's list of external addresses.
>>>
>>> This scheme requires repeating that series of 3 x mov + ldrb instructions at every object field store in a given
>>> compiled method. That also implies a runtime relocation for each such sequence when the code is restored from the
>>> SCCache.
>>>
>>> With C2 the barrier manifests as a (Set dst con) for a special ConP value (operand con has type immRegionGrainShift)
>>> feeding a LoadB. I guess C2 might conceivably be able to optimize away some of the repeat movz/k and ldrb sequences
>>> if it is able to keep the address or byte value in a register or spill slot but I would not expect that to be likely.
>>>
>>>> I suppose this require us to put multiple copies of G1HeapRegion::LogHRGrainSize inside the AOT code, as there's a
>>>> limit for the offset. But we will be patching fewer places than every sites that needs to know the grain size.
>>> I think what you are suggesting here is what I described as option 4. i.e. we put the grain size in the nmethod const
>>> section (or in a dedicated data location for a non-nmethod blob) and insert a pc-relative load in the barrier to feed
>>> the lsr.
>>>
>>> With AOT code this would require a special relocation to mark the constants area slot (or the non-method blob data
>>> slot), lets call it reloc_grain_shift_const. It would patch the constant to whatever value field
>>> G1HeapRegion::LogHRGrainSize has in the current runtime (or rather to whatever grain size is reported by the barrier
>>> card table). We don't have such a reloc at present.. We do have an existing reloc for a runtime data address which is
>>> why I implemented option 2 first (to work out where I would need to tweak the compilers and barrier set assemblers
>>> plus auxiliary classes).
>>>
>>> With option 4 I believe we will only need one occurrence of the constant. On AArch64 we would use either adr or adrp
>>> + add to install a pc-relative address into a register and then an ldrb via that register.
>>>
>>> adr reg, #<21bits>
>>> ldrb reg, reg
>>> ...
>>> lsr reg2, reg, reg2
>>>
>>> or
>>>
>>> adrp reg, #<21bits> # selects 12 bit-aligned page
>>> add reg, #<12bits>
>>> ldrb reg, reg
>>> ...
>>> lsr reg2, reg, reg2
>>>
>>> The adr/adrp instructions do not need relocating which is why scheme 4 would only require 1 relocation per nmethod
>>> (or non-nmethod blob).
>>>
>>> Option 3 involves generating the normal barrier
>>>
>>> lsr, reg, #imm, reg
>>>
>>> The difference is that for AOT code we would mark the instruction with a new relocation, let's call it
>>> reloc_grain_shift_immediate. Patching for this reloc would assert that the corresponding instruction is an shift and
>>> that the current GC barrier set is using a card table. It would update the immediate operand with whatever grain size
>>> shift was reported by the card table.
>>>
>>> Like scheme 2 this would require a reloc for every object field write in an nmethod (or non-nmethod blob).
>>>
>>> regards,
>>>
>>>
>>> Andrew Dinn
>>> -----------
>>>
>>
>
More information about the leyden-dev
mailing list