[External] : Re: premain: Possible solutions to use runtime G1 region grain size in AOT code (JDK-8335440)

Vladimir Kozlov vladimir.kozlov at oracle.com
Wed Jul 31 20:06:41 UTC 2024


Very nice. I like it.

Thanks,
Vladimir K

On 7/31/24 8:23 AM, Andrew Dinn wrote:
> I now have a fourth solution to the region resizing problem which I think is probably the nicest looking one so far:
> 
> 
> https://urldefense.com/v3/__https://github.com/adinn/leyden/compare/premain...adinn:leyden:JDK-8335440-load-via-codecache?expand=1__;!!ACWV5N9M2RV99hQ!JI8n2MMyF844NBqR63MhWXggKzPCoouZ8pcwgasAa94iYPRzpcD7af-zSicg5Wm0nv9ZyaF1CpaC74Mqn3g$
> Opinions welcome!
> 
> This version relies on a code cache AOT data area, generated into the initial stubs blob, which stores runtime constants 
> that parameterize operation of AOT code. The AOT data area is initialized with the current runtime constants immediately 
> after universe init. AOT code loads the constants indirectly via the region base address rather than embedding them into 
> instructions. Non-AOT code can continue to use these constants as immediate values.
> 
> With this patch I have included the region grain size shift and the card table shift in the data area. So, this patch 
> ensures that AOT code loads both grain shift and card shift counts rather than using immediates. I tested it passing 
> -XX:GCCardSizeInBytes=128 -Xmx4M to a production run and it works fine.
> 
> The only address that needs relocating wit this patch is the AOT data area base. Note that it was useful using this to 
> implement loading of two AOT code parameters as it highlights a general problem that is not face with only one field.
> 
> I decide to register only the base address of the AOT data area with the code cache as a relocatable (external) address 
> because this is more robust than requiring registration of the address of every data field embedded in the AOT data 
> area. However, in order for the AOT cache to be able to relocate loads from the data area they need to be implemented by 
> installing the base address in a register and then adding the offset before executing the load.
> 
> This is fine for stub code where the generator can simply plant a load effective address (lea) of a pointer constant 
> into a register and then add in the offset. However, I know C2 Value optimizations will certainly coalesce a pointer 
> constant load followed by an add to an updated pointer constant load. I think this can also happen with C1 (it doesn't 
> at the moment because in the current C1 code the grain shift load -- at offset 0 -- happens inline, while the card shift 
> load -- at offset 4 -- only happens in a callout to stub code). I fixed this general problem by ensuring that the C1 and 
> C2 back ends detect pointer constants that lie anywhere in the data area address *range*. For (only) those cases the lea 
> is implemenetd by generating a base address load plus offset add.
> 
> It would be nice to be ale to fold the offset into the subsequent load. I'll look into that as a follow-up. Either way, 
> I think this looks like the simplest and most robust solution to ensuring that AOT code is correctly parameterized at 
> runtime. It is particularly nice that it doesn't need any new relocations.
> 
> regards,
> 
> 
> Andrew Dinn
> -----------
> 
> On 24/07/2024 20:43, Vladimir Kozlov wrote:
>> Thank you for checking code for card_table_address.
>>
>>  > Oh, that's a nuisance. I was concerned that I was perhaps using the wrong reloc model here. Should I be encoding info
>>  > about what/how to patch the target instruction(s) using reloc data?
>>
>> I am currently thinking about adding indication what relocation is pointing on: blob, stubs, external address, string.
>> Currently we are looking through all our address tables in SCCache to find match. Would be nice if we can get this 
>> information from relocation. And I also thought about using format for it but realized it is PD and it reserves bits 
>> in RelocInfo word. On other hand we can add an other 12 (31 - 18 - 1) new relocations without using additional bits.
>>
>> In short add new relocation if you need it.
>>
>> Thanks,
>> Vladimir K
>>
>> On 7/24/24 7:40 AM, Andrew Dinn wrote:
>>> On 23/07/2024 18:44, Vladimir Kozlov wrote:
>>>> I agree. As we discussed on meeting we may use it to patch compressed oops/klass code too.
>>>
>>> Yes, we could potentially do something similar to adjust the shift and base adjustment employed when generating a 
>>> narrow oop or klass encode or decode. However, that would only be useful with certain limits.
>>>
>>> 1) We can only cater for a change from one narrow-oop/klass base+shift configuration to another base+shift 
>>> configuration. Patching code to switch from narrow oops/klass pointers to full width (or vice versa) would require 
>>> adjusting not just the oop/klass load/store but also other instructions that hard-wire header/payload field sizes and 
>>> offsets based on the narrow/full width layout assumptions. Likewise it would require resizing and repacking objects 
>>> stored in the CDS mapped heap section.
>>>
>>> 2) Even if we restrict ourselves to the case where we retain narrow oops and simply allow for a change of base and/or 
>>> shift (i.e. keep the object layouts the same), that will only work if pointers installed in mapped CDS heap objects 
>>> are also rewritten to use the runtime encoding.
>>>
>>> 3) When generating AOT code for the encode and decode we would have to allow for the worst case i.e. leave room 
>>> (nops) for both heap-base adjustment and shift even when they are not needed at generate time.
>>>
>>>> We have the same issue with card_table_address. I actually don't know why current AOT code works??? May be because 
>>>> it is the same value in training and production runs. But we need to fix it too.
>>>
>>>> It is constant node which is embedded into AddP node:
>>>> https://urldefense.com/v3/__https://github.com/openjdk/leyden/blob/premain/src/hotspot/share/gc/shared/c2/cardTableBarrierSetC2.cpp*L96__;Iw!!ACWV5N9M2RV99hQ!LjgTGnmqk1Ez4pzqXuTXAbZZfHhhP9a1zAFdWGz7V59xO3ggQYVrdW3jTlSNYML_SvRN0Ip_HA-9E0w0Jbo$ 
>>>
>>> I believe C2 is ok. On AArch64 the C2 back end has an operand called immByteMapBase that matches any use of 
>>> byte_map_base as a pointer constant. A corresponding rule ensures that any attempt to load that constant pointer is 
>>> done by calling
>>>
>>>    __ load_byte_map_base()
>>>
>>> and the macr assembler uses an ExternalAddress when StoreCachedCode is set.
>>>
>>>> I marked it only in platform specific code:
>>>> https://urldefense.com/v3/__https://github.com/openjdk/leyden/blob/premain/src/hotspot/cpu/x86/gc/shared/cardTableBarrierSetAssembler_x86.cpp*L68__;Iw!!ACWV5N9M2RV99hQ!LjgTGnmqk1Ez4pzqXuTXAbZZfHhhP9a1zAFdWGz7V59xO3ggQYVrdW3jTlSNYML_SvRN0Ip_HA-92hI_Afg$ 
>>>
>>> I also think there is no mystery as to why this works on AArch64. The equivalent code in 
>>> cardTableBarrierSetAssembler_aarch64.cpp also calls
>>>
>>>    __ load_byte_map_base()
>>>
>>> However, that said, looking at the barrier card table write it seems the card table shift is another value that the 
>>> user might decide to reconfigure between assembly and production runs. The shift is derived form GCCardTableSize 
>>> which is a gc global command line option so susceptible to being reset.
>>>
>>> We could use an aot_reloc with a different format to tag card table shift lsr instructions and patch them at shared 
>>> code load to use the current runtime GCCardTableSize. Of course, this is not as urgent a problem as the grain size 
>>> shift because the card table size only ever changes thanks to an explicit command line request whereas the grain size 
>>> may be changed ergonomically when -Xmx is passed.
>>>
>>>> Be careful with using `format()` in shared code because it depends on platform specific setting `format_width` in 
>>>> `relocInfo_<arch>.hpp` (32-bit arm has format_width = 0).
>>>
>>> Oh, that's a nuisance. I was concerned that I was perhaps using the wrong reloc model here. Should I be encoding info 
>>> about what/how to patch the target instruction(s) using reloc data?
>>>
>>> regards,
>>>
>>>
>>> Andrew Dinn
>>> -----------
>>>
>>
> 


More information about the leyden-dev mailing list