[External] : Re: premain: Possible solutions to use runtime G1 region grain size in AOT code (JDK-8335440)
Vladimir Kozlov
vladimir.kozlov at oracle.com
Thu Aug 15 22:30:37 UTC 2024
On 8/15/24 3:06 PM, Vladimir Kozlov wrote:
> No.
>
> Why you still use base+offsets for constants address? Is it for Aarch64?
>
> What I want is something like this:
>
> static address grain_shift_address() { return
> &_aot_runtime_constants::_grain_shift; }
>
> And you will not need `#if INCLUDE_CDS` if you put AOTRuntimeConstants
> into SCCache.cpp.
>
> I don't see the need to separate create_field_address_list() code from
> field_addresses_list().
>
> Add assert to field_addresses_list() to check that values are initialized.
Actually you don't need assert since we need only addresses of fields.
I don't see use of aot_runtime_constants_at() anymore.
Also why you need aot_runtime_constants()? Why not have
_aot_runtime_constants as static field of AOTRuntimeConstants class
which you can initialize in (again) SCCache.cpp?
Vladimir K
>
> Vladimir K
>
> On 8/15/24 1:43 PM, Ashutosh Mehra wrote:
>> Hi Vladimir,
>> Here is the updated patch. We have refactored the code as per your
>> suggestions.
>> https://github.com/openjdk/leyden/compare/premain...ashu-mehra:leyden:JDK-8335440-load-via-codecache?expand=1 <https://urldefense.com/v3/__https://github.com/openjdk/leyden/compare/premain...ashu-mehra:leyden:JDK-8335440-load-via-codecache?expand=1__;!!ACWV5N9M2RV99hQ!KJr_KHIS5usVHG5WgzaX4UOvAVI4Y2Z-ZoblB2fhuGsj2i-89W0RoSddeMDGtpieLZydGC4pagdnoaYt2zbt_w$>
>>
>> Let us know if you have any more suggestions.
>>
>> Thanks,
>> - Ashutosh Mehra
>>
>>
>> On Thu, Aug 15, 2024 at 11:02 AM Vladimir Kozlov
>> <vladimir.kozlov at oracle.com <mailto:vladimir.kozlov at oracle.com>> wrote:
>>
>> > Ashu is proposing adding support for these latter instructions
>> to C1 so
>> > the barrier code does not have to pin the shift count to ECX.
>> Do we
>> > care? Or can we live with the effect of pinning?
>>
>> We only care about correctness currently. Performance (registers
>> spilling) is not issue. And I am concern about size and complexity of
>> Leyden specific changes which eventually will be pushed into
>> mainline.
>> Unless you see benefits for new C1 shift instructions in mainline I
>> would go with using ECX. And again, do this if code for using ECX is
>> simpler than adding new instructions. If it is not - use new
>> instructions.
>>
>> I was hopping that "Late Barrier Expansion for G1" JEP may help here
>> too
>> but it has only C2 changes :(
>>
>> I agree with updating this patch and may be even pushing it into
>> premain
>> (for testing) regardless "Late Barrier Expansion for G1" JEP status.
>>
>> Thanks,
>> Vladimir K
>>
>>
>> On 8/15/24 3:18 AM, Andrew Dinn wrote:
>> > On 15/08/2024 00:50, Vladimir Kozlov wrote:
>> >> Do we really need AOTRuntimeConstants instance be a stub in
>> CodeCache?
>> >> Looking on this code and I think this complicate implementation
>> if use
>> >> ExternalAddress relocation anyway. Can it be static instance in
>> >> SCCache.cpp which we initialize the same way as in current
>> changes?
>> >
>> > Yes, you are right: we don't need the AOTRuntimeConstants struct
>> to be
>> > embedded in the cache if we are using an ExternalRelocation. The
>> initial
>> > idea for doing that was because it provides a 32-bit bound on the
>> offset
>> > from any struct field to a corresponding field load (lea). That
>> would
>> > allow us to use PC-relative loads of the field addresses -- which
>> saves
>> > instruction footprint on AArch64, possibly also on x86? However,
>> we can
>> > only do that if we also implement a new relocation (which seems
>> > unnecessary).
>> >
>> > So, we will redefine the AOTRuntimeConstants struct as a C++
>> static field.
>> >
>> >> I do not understand why you have concern about having separate
>> >> relocation for each constant in AOTRuntimeConstants. We have
>> only 2
>> >> now. Even they grow I don't think we will have a lot of them.
>> Actually
>> >> the question: will it simplify code if each constant's address is
>> >> treated as separate and not offset in AOTRuntimeConstants?
>> Instead of
>> >> loading AOTRuntimeConstants address and then adding offset we
>> just
>> >> load address of constant.
>> >
>> > Yes, you are right again: if we register the address of every
>> field in
>> > the struct in the SCC address table then that will simplify the
>> back end
>> > code. I was not sure how much we wanted to rely on address table
>> > registration but given that we *are* reliant on it just now
>> then we
>> > might as well have the benefit.
>> >
>> > So, we will do the extra registration and limit the back end
>> changes to
>> > detect ConP occurrences in the AOTRuntimeConstants range and load
>> them
>> > using an ExternalAddress reloc.
>> >
>> >> I don't understand question about C1 (I am not familiar with
>> it). What
>> >> it generates for shifts now? Why you need 3-operand shift?
>> >
>> > I'm not sure '3-operand' is the right term to use to describe
>> what is at
>> > stake here. It's really about the kind of operand rather than
>> their
>> > number. The current region/card shift operations employ an
>> immediate
>> > shift count. The AOT region/card shift operations need to be
>> fed the
>> > shift count via a register (the target of the constant load).
>> >
>> > C1 does currently generate shifts where the shift count is in a
>> register
>> > but it only does so using the flavours of shift (sar/sal/shr/shl)
>> that
>> > require the register to be ECX. It doesn't currently generate the
>> > alternative instructions (sarx/shlx/shrx) that accept a shift
>> count in
>> > any register. I'm not 100% sure why but I think this is because
>> not all
>> > CPUs implement them (they require CPUID feature BM2) or it may be
>> to do
>> > with them setting/not setting flags.
>> >
>> > Ashu is proposing adding support for these latter instructions to
>> C1 so
>> > the barrier code does not have to pin the shift count to ECX.
>> Do we
>> > care? Or can we live with the effect of pinning?
>> >
>> >> Code style: I noticed and aarch64.ad
>>
>> <https://urldefense.com/v3/__http://aarch64.ad__;!!ACWV5N9M2RV99hQ!KJr_KHIS5usVHG5WgzaX4UOvAVI4Y2Z-ZoblB2fhuGsj2i-89W0RoSddeMDGtpieLZydGC4pagdnoaYhejMFCA$> still using old stile of
>> >> instructions definition using `enc_class`. We don't do this any
>> more
>> >> (at least in x86*.ad) when we adding new instructions with one
>> >> implementation. Please use `ins_encode %{ %}` in such cases.
>> >
>> > Sure, we will fix that.
>> >
>> >> I think, if you need additional register you should push/pop to
>> >> guarantee we don't stomp over some used value.
>> > Ok, we will play safe and do that.
>> >
>> > One final comment: Much of this discussion is going to be
>> side-stepped
>> > by the G1 late barrier implementation that is currently working
>> its way
>> > into mainline. That will remove the need to inject barrier IR
>> subgraphs
>> > into the C1 and C2 IR graphs and eliminate most (all?) of the
>> associated
>> > C2 reduction rules from the ad files. I think we should still
>> update the
>> > patch to accommodate the feedback above. We can simplify it
>> further once
>> > the G1 changes come in from mainline and pursue in parallel the
>> idea of
>> > accommodating changes to oop encodings using this model.
>> >
>> > regards,
>> >
>> >
>> > Andrew Dinn
>> > -----------
>> >
>>
More information about the leyden-dev
mailing list