[External] : Re: premain: Possible solutions to use runtime G1 region grain size in AOT code (JDK-8335440)

Thu Aug 15 22:06:23 UTC 2024

No.

Why you still use base+offsets for constants address? Is it for Aarch64?

What I want is something like this:

static address grain_shift_address() { return 
&_aot_runtime_constants::_grain_shift; }

And you will not need `#if INCLUDE_CDS` if you put AOTRuntimeConstants 
into SCCache.cpp.

I don't see the need to separate create_field_address_list() code from 
field_addresses_list().

Add assert to field_addresses_list() to check that values are initialized.

Vladimir K

On 8/15/24 1:43 PM, Ashutosh Mehra wrote:
> Hi Vladimir,
> Here is the updated patch. We have refactored the code as per your 
> suggestions.
> https://github.com/openjdk/leyden/compare/premain...ashu-mehra:leyden:JDK-8335440-load-via-codecache?expand=1 <https://urldefense.com/v3/__https://github.com/openjdk/leyden/compare/premain...ashu-mehra:leyden:JDK-8335440-load-via-codecache?expand=1__;!!ACWV5N9M2RV99hQ!KJr_KHIS5usVHG5WgzaX4UOvAVI4Y2Z-ZoblB2fhuGsj2i-89W0RoSddeMDGtpieLZydGC4pagdnoaYt2zbt_w$>
> 
> Let us know if you have any more suggestions.
> 
> Thanks,
> - Ashutosh Mehra
> 
> 
> On Thu, Aug 15, 2024 at 11:02 AM Vladimir Kozlov 
> <vladimir.kozlov at oracle.com <mailto:vladimir.kozlov at oracle.com>> wrote:
> 
>       > Ashu is proposing adding support for these latter instructions
>     to C1 so
>       > the barrier code does not have to pin the shift count to ECX. Do we
>       > care? Or can we live with the effect of pinning?
> 
>     We only care about correctness currently. Performance (registers
>     spilling) is not issue. And I am concern about size and complexity of
>     Leyden specific changes which eventually will be pushed into mainline.
>     Unless you see benefits for new C1 shift instructions in mainline I
>     would go with using ECX. And again, do this if code for using ECX is
>     simpler than adding new instructions. If it is not - use new
>     instructions.
> 
>     I was hopping that "Late Barrier Expansion for G1" JEP may help here
>     too
>     but it has only C2 changes :(
> 
>     I agree with updating this patch and may be even pushing it into
>     premain
>     (for testing) regardless "Late Barrier Expansion for G1" JEP status.
> 
>     Thanks,
>     Vladimir K
> 
> 
>     On 8/15/24 3:18 AM, Andrew Dinn wrote:
>      > On 15/08/2024 00:50, Vladimir Kozlov wrote:
>      >> Do we really need AOTRuntimeConstants instance be a stub in
>     CodeCache?
>      >> Looking on this code and I think this complicate implementation
>     if use
>      >> ExternalAddress relocation anyway. Can it be static instance in
>      >> SCCache.cpp which we initialize the same way as in current changes?
>      >
>      > Yes, you are right: we don't need the AOTRuntimeConstants struct
>     to be
>      > embedded in the cache if we are using an ExternalRelocation. The
>     initial
>      > idea for doing that was because it provides a 32-bit bound on the
>     offset
>      > from any struct field to a corresponding field load (lea). That
>     would
>      > allow us to use PC-relative loads of the field addresses -- which
>     saves
>      > instruction footprint on AArch64, possibly also on x86? However,
>     we can
>      > only do that if we also implement a new relocation (which seems
>      > unnecessary).
>      >
>      > So, we will redefine the AOTRuntimeConstants struct as a C++
>     static field.
>      >
>      >> I do not understand why you have concern about having separate
>      >> relocation for each constant in AOTRuntimeConstants. We have only 2
>      >> now. Even they grow I don't think we will have a lot of them.
>     Actually
>      >> the question: will it simplify code if each constant's address is
>      >> treated as separate and not offset in AOTRuntimeConstants?
>     Instead of
>      >> loading AOTRuntimeConstants address and then adding offset we just
>      >> load address of constant.
>      >
>      > Yes, you are right again: if we register the address of every
>     field in
>      > the struct in the SCC address table then that will simplify the
>     back end
>      > code. I was not sure how much we wanted to rely on address table
>      > registration but given that we *are* reliant on it just now then we
>      > might as well have the benefit.
>      >
>      > So, we will do the extra registration and limit the back end
>     changes to
>      > detect ConP occurrences in the AOTRuntimeConstants range and load
>     them
>      > using an ExternalAddress reloc.
>      >
>      >> I don't understand question about C1 (I am not familiar with
>     it). What
>      >> it generates for shifts now? Why you need 3-operand shift?
>      >
>      > I'm not sure '3-operand' is the right term to use to describe
>     what is at
>      > stake here. It's really about the kind of operand rather than their
>      > number. The current region/card shift operations employ an immediate
>      > shift count. The AOT region/card shift operations need to be fed the
>      > shift count via a register (the target of the constant load).
>      >
>      > C1 does currently generate shifts where the shift count is in a
>     register
>      > but it only does so using the flavours of shift (sar/sal/shr/shl)
>     that
>      > require the register to be ECX. It doesn't currently generate the
>      > alternative instructions (sarx/shlx/shrx) that accept a shift
>     count in
>      > any register. I'm not 100% sure why but I think this is because
>     not all
>      > CPUs implement them (they require CPUID feature BM2) or it may be
>     to do
>      > with them setting/not setting flags.
>      >
>      > Ashu is proposing adding support for these latter instructions to
>     C1 so
>      > the barrier code does not have to pin the shift count to ECX. Do we
>      > care? Or can we live with the effect of pinning?
>      >
>      >> Code style: I noticed and aarch64.ad
>     <https://urldefense.com/v3/__http://aarch64.ad__;!!ACWV5N9M2RV99hQ!KJr_KHIS5usVHG5WgzaX4UOvAVI4Y2Z-ZoblB2fhuGsj2i-89W0RoSddeMDGtpieLZydGC4pagdnoaYhejMFCA$> still using old stile of
>      >> instructions definition using `enc_class`. We don't do this any
>     more
>      >> (at least in x86*.ad) when we adding new instructions with one
>      >> implementation. Please use `ins_encode %{ %}` in such cases.
>      >
>      > Sure, we will fix that.
>      >
>      >> I think, if you need additional register you should push/pop to
>      >> guarantee we don't stomp over some used value.
>      > Ok, we will play safe and do that.
>      >
>      > One final comment: Much of this discussion is going to be
>     side-stepped
>      > by the G1 late barrier implementation that is currently working
>     its way
>      > into mainline. That will remove the need to inject barrier IR
>     subgraphs
>      > into the C1 and C2 IR graphs and eliminate most (all?) of the
>     associated
>      > C2 reduction rules from the ad files. I think we should still
>     update the
>      > patch to accommodate the feedback above. We can simplify it
>     further once
>      > the G1 changes come in from mainline and pursue in parallel the
>     idea of
>      > accommodating changes to oop encodings using this model.
>      >
>      > regards,
>      >
>      >
>      > Andrew Dinn
>      > -----------
>      >
>