[External] : Re: premain: Possible solutions to use runtime G1 region grain size in AOT code (JDK-8335440)
Vladimir Kozlov
vladimir.kozlov at oracle.com
Thu Aug 15 22:06:23 UTC 2024
No.
Why you still use base+offsets for constants address? Is it for Aarch64?
What I want is something like this:
static address grain_shift_address() { return
&_aot_runtime_constants::_grain_shift; }
And you will not need `#if INCLUDE_CDS` if you put AOTRuntimeConstants
into SCCache.cpp.
I don't see the need to separate create_field_address_list() code from
field_addresses_list().
Add assert to field_addresses_list() to check that values are initialized.
Vladimir K
On 8/15/24 1:43 PM, Ashutosh Mehra wrote:
> Hi Vladimir,
> Here is the updated patch. We have refactored the code as per your
> suggestions.
> https://github.com/openjdk/leyden/compare/premain...ashu-mehra:leyden:JDK-8335440-load-via-codecache?expand=1 <https://urldefense.com/v3/__https://github.com/openjdk/leyden/compare/premain...ashu-mehra:leyden:JDK-8335440-load-via-codecache?expand=1__;!!ACWV5N9M2RV99hQ!KJr_KHIS5usVHG5WgzaX4UOvAVI4Y2Z-ZoblB2fhuGsj2i-89W0RoSddeMDGtpieLZydGC4pagdnoaYt2zbt_w$>
>
> Let us know if you have any more suggestions.
>
> Thanks,
> - Ashutosh Mehra
>
>
> On Thu, Aug 15, 2024 at 11:02 AM Vladimir Kozlov
> <vladimir.kozlov at oracle.com <mailto:vladimir.kozlov at oracle.com>> wrote:
>
> > Ashu is proposing adding support for these latter instructions
> to C1 so
> > the barrier code does not have to pin the shift count to ECX. Do we
> > care? Or can we live with the effect of pinning?
>
> We only care about correctness currently. Performance (registers
> spilling) is not issue. And I am concern about size and complexity of
> Leyden specific changes which eventually will be pushed into mainline.
> Unless you see benefits for new C1 shift instructions in mainline I
> would go with using ECX. And again, do this if code for using ECX is
> simpler than adding new instructions. If it is not - use new
> instructions.
>
> I was hopping that "Late Barrier Expansion for G1" JEP may help here
> too
> but it has only C2 changes :(
>
> I agree with updating this patch and may be even pushing it into
> premain
> (for testing) regardless "Late Barrier Expansion for G1" JEP status.
>
> Thanks,
> Vladimir K
>
>
> On 8/15/24 3:18 AM, Andrew Dinn wrote:
> > On 15/08/2024 00:50, Vladimir Kozlov wrote:
> >> Do we really need AOTRuntimeConstants instance be a stub in
> CodeCache?
> >> Looking on this code and I think this complicate implementation
> if use
> >> ExternalAddress relocation anyway. Can it be static instance in
> >> SCCache.cpp which we initialize the same way as in current changes?
> >
> > Yes, you are right: we don't need the AOTRuntimeConstants struct
> to be
> > embedded in the cache if we are using an ExternalRelocation. The
> initial
> > idea for doing that was because it provides a 32-bit bound on the
> offset
> > from any struct field to a corresponding field load (lea). That
> would
> > allow us to use PC-relative loads of the field addresses -- which
> saves
> > instruction footprint on AArch64, possibly also on x86? However,
> we can
> > only do that if we also implement a new relocation (which seems
> > unnecessary).
> >
> > So, we will redefine the AOTRuntimeConstants struct as a C++
> static field.
> >
> >> I do not understand why you have concern about having separate
> >> relocation for each constant in AOTRuntimeConstants. We have only 2
> >> now. Even they grow I don't think we will have a lot of them.
> Actually
> >> the question: will it simplify code if each constant's address is
> >> treated as separate and not offset in AOTRuntimeConstants?
> Instead of
> >> loading AOTRuntimeConstants address and then adding offset we just
> >> load address of constant.
> >
> > Yes, you are right again: if we register the address of every
> field in
> > the struct in the SCC address table then that will simplify the
> back end
> > code. I was not sure how much we wanted to rely on address table
> > registration but given that we *are* reliant on it just now then we
> > might as well have the benefit.
> >
> > So, we will do the extra registration and limit the back end
> changes to
> > detect ConP occurrences in the AOTRuntimeConstants range and load
> them
> > using an ExternalAddress reloc.
> >
> >> I don't understand question about C1 (I am not familiar with
> it). What
> >> it generates for shifts now? Why you need 3-operand shift?
> >
> > I'm not sure '3-operand' is the right term to use to describe
> what is at
> > stake here. It's really about the kind of operand rather than their
> > number. The current region/card shift operations employ an immediate
> > shift count. The AOT region/card shift operations need to be fed the
> > shift count via a register (the target of the constant load).
> >
> > C1 does currently generate shifts where the shift count is in a
> register
> > but it only does so using the flavours of shift (sar/sal/shr/shl)
> that
> > require the register to be ECX. It doesn't currently generate the
> > alternative instructions (sarx/shlx/shrx) that accept a shift
> count in
> > any register. I'm not 100% sure why but I think this is because
> not all
> > CPUs implement them (they require CPUID feature BM2) or it may be
> to do
> > with them setting/not setting flags.
> >
> > Ashu is proposing adding support for these latter instructions to
> C1 so
> > the barrier code does not have to pin the shift count to ECX. Do we
> > care? Or can we live with the effect of pinning?
> >
> >> Code style: I noticed and aarch64.ad
> <https://urldefense.com/v3/__http://aarch64.ad__;!!ACWV5N9M2RV99hQ!KJr_KHIS5usVHG5WgzaX4UOvAVI4Y2Z-ZoblB2fhuGsj2i-89W0RoSddeMDGtpieLZydGC4pagdnoaYhejMFCA$> still using old stile of
> >> instructions definition using `enc_class`. We don't do this any
> more
> >> (at least in x86*.ad) when we adding new instructions with one
> >> implementation. Please use `ins_encode %{ %}` in such cases.
> >
> > Sure, we will fix that.
> >
> >> I think, if you need additional register you should push/pop to
> >> guarantee we don't stomp over some used value.
> > Ok, we will play safe and do that.
> >
> > One final comment: Much of this discussion is going to be
> side-stepped
> > by the G1 late barrier implementation that is currently working
> its way
> > into mainline. That will remove the need to inject barrier IR
> subgraphs
> > into the C1 and C2 IR graphs and eliminate most (all?) of the
> associated
> > C2 reduction rules from the ad files. I think we should still
> update the
> > patch to accommodate the feedback above. We can simplify it
> further once
> > the G1 changes come in from mainline and pursue in parallel the
> idea of
> > accommodating changes to oop encodings using this model.
> >
> > regards,
> >
> >
> > Andrew Dinn
> > -----------
> >
>
More information about the leyden-dev
mailing list