[External] : Re: premain: Possible solutions to use runtime G1 region grain size in AOT code (JDK-8335440)

Ashutosh Mehra asmehra at redhat.com
Thu Aug 15 20:43:22 UTC 2024


Hi Vladimir,
Here is the updated patch. We have refactored the code as per your
suggestions.
https://github.com/openjdk/leyden/compare/premain...ashu-mehra:leyden:JDK-8335440-load-via-codecache?expand=1

Let us know if you have any more suggestions.

Thanks,
- Ashutosh Mehra


On Thu, Aug 15, 2024 at 11:02 AM Vladimir Kozlov <vladimir.kozlov at oracle.com>
wrote:

>  > Ashu is proposing adding support for these latter instructions to C1 so
>  > the barrier code does not have to pin the shift count to ECX. Do we
>  > care? Or can we live with the effect of pinning?
>
> We only care about correctness currently. Performance (registers
> spilling) is not issue. And I am concern about size and complexity of
> Leyden specific changes which eventually will be pushed into mainline.
> Unless you see benefits for new C1 shift instructions in mainline I
> would go with using ECX. And again, do this if code for using ECX is
> simpler than adding new instructions. If it is not - use new instructions.
>
> I was hopping that "Late Barrier Expansion for G1" JEP may help here too
> but it has only C2 changes :(
>
> I agree with updating this patch and may be even pushing it into premain
> (for testing) regardless "Late Barrier Expansion for G1" JEP status.
>
> Thanks,
> Vladimir K
>
>
> On 8/15/24 3:18 AM, Andrew Dinn wrote:
> > On 15/08/2024 00:50, Vladimir Kozlov wrote:
> >> Do we really need AOTRuntimeConstants instance be a stub in CodeCache?
> >> Looking on this code and I think this complicate implementation if use
> >> ExternalAddress relocation anyway. Can it be static instance in
> >> SCCache.cpp which we initialize the same way as in current changes?
> >
> > Yes, you are right: we don't need the AOTRuntimeConstants struct to be
> > embedded in the cache if we are using an ExternalRelocation. The initial
> > idea for doing that was because it provides a 32-bit bound on the offset
> > from any struct field to a corresponding field load (lea). That would
> > allow us to use PC-relative loads of the field addresses -- which saves
> > instruction footprint on AArch64, possibly also on x86? However, we can
> > only do that if we also implement a new relocation (which seems
> > unnecessary).
> >
> > So, we will redefine the AOTRuntimeConstants struct as a C++ static
> field.
> >
> >> I do not understand why you have concern about having separate
> >> relocation for each constant in AOTRuntimeConstants. We have only 2
> >> now. Even they grow I don't think we will have a lot of them. Actually
> >> the question: will it simplify code if each constant's address is
> >> treated as separate and not offset in AOTRuntimeConstants? Instead of
> >> loading AOTRuntimeConstants address and then adding offset we just
> >> load address of constant.
> >
> > Yes, you are right again: if we register the address of every field in
> > the struct in the SCC address table then that will simplify the back end
> > code. I was not sure how much we wanted to rely on address table
> > registration but given that we *are* reliant on it just now then we
> > might as well have the benefit.
> >
> > So, we will do the extra registration and limit the back end changes to
> > detect ConP occurrences in the AOTRuntimeConstants range and load them
> > using an ExternalAddress reloc.
> >
> >> I don't understand question about C1 (I am not familiar with it). What
> >> it generates for shifts now? Why you need 3-operand shift?
> >
> > I'm not sure '3-operand' is the right term to use to describe what is at
> > stake here. It's really about the kind of operand rather than their
> > number. The current region/card shift operations employ an immediate
> > shift count. The AOT region/card shift operations need to be fed the
> > shift count via a register (the target of the constant load).
> >
> > C1 does currently generate shifts where the shift count is in a register
> > but it only does so using the flavours of shift (sar/sal/shr/shl) that
> > require the register to be ECX. It doesn't currently generate the
> > alternative instructions (sarx/shlx/shrx) that accept a shift count in
> > any register. I'm not 100% sure why but I think this is because not all
> > CPUs implement them (they require CPUID feature BM2) or it may be to do
> > with them setting/not setting flags.
> >
> > Ashu is proposing adding support for these latter instructions to C1 so
> > the barrier code does not have to pin the shift count to ECX. Do we
> > care? Or can we live with the effect of pinning?
> >
> >> Code style: I noticed and aarch64.ad still using old stile of
> >> instructions definition using `enc_class`. We don't do this any more
> >> (at least in x86*.ad) when we adding new instructions with one
> >> implementation. Please use `ins_encode %{ %}` in such cases.
> >
> > Sure, we will fix that.
> >
> >> I think, if you need additional register you should push/pop to
> >> guarantee we don't stomp over some used value.
> > Ok, we will play safe and do that.
> >
> > One final comment: Much of this discussion is going to be side-stepped
> > by the G1 late barrier implementation that is currently working its way
> > into mainline. That will remove the need to inject barrier IR subgraphs
> > into the C1 and C2 IR graphs and eliminate most (all?) of the associated
> > C2 reduction rules from the ad files. I think we should still update the
> > patch to accommodate the feedback above. We can simplify it further once
> > the G1 changes come in from mainline and pursue in parallel the idea of
> > accommodating changes to oop encodings using this model.
> >
> > regards,
> >
> >
> > Andrew Dinn
> > -----------
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/leyden-dev/attachments/20240815/9f100168/attachment.htm>


More information about the leyden-dev mailing list