[External] : Re: premain: Possible solutions to use runtime G1 region grain size in AOT code (JDK-8335440)
Vladimir Kozlov
vladimir.kozlov at oracle.com
Thu Aug 15 15:02:34 UTC 2024
> Ashu is proposing adding support for these latter instructions to C1 so
> the barrier code does not have to pin the shift count to ECX. Do we
> care? Or can we live with the effect of pinning?
We only care about correctness currently. Performance (registers
spilling) is not issue. And I am concern about size and complexity of
Leyden specific changes which eventually will be pushed into mainline.
Unless you see benefits for new C1 shift instructions in mainline I
would go with using ECX. And again, do this if code for using ECX is
simpler than adding new instructions. If it is not - use new instructions.
I was hopping that "Late Barrier Expansion for G1" JEP may help here too
but it has only C2 changes :(
I agree with updating this patch and may be even pushing it into premain
(for testing) regardless "Late Barrier Expansion for G1" JEP status.
Thanks,
Vladimir K
On 8/15/24 3:18 AM, Andrew Dinn wrote:
> On 15/08/2024 00:50, Vladimir Kozlov wrote:
>> Do we really need AOTRuntimeConstants instance be a stub in CodeCache?
>> Looking on this code and I think this complicate implementation if use
>> ExternalAddress relocation anyway. Can it be static instance in
>> SCCache.cpp which we initialize the same way as in current changes?
>
> Yes, you are right: we don't need the AOTRuntimeConstants struct to be
> embedded in the cache if we are using an ExternalRelocation. The initial
> idea for doing that was because it provides a 32-bit bound on the offset
> from any struct field to a corresponding field load (lea). That would
> allow us to use PC-relative loads of the field addresses -- which saves
> instruction footprint on AArch64, possibly also on x86? However, we can
> only do that if we also implement a new relocation (which seems
> unnecessary).
>
> So, we will redefine the AOTRuntimeConstants struct as a C++ static field.
>
>> I do not understand why you have concern about having separate
>> relocation for each constant in AOTRuntimeConstants. We have only 2
>> now. Even they grow I don't think we will have a lot of them. Actually
>> the question: will it simplify code if each constant's address is
>> treated as separate and not offset in AOTRuntimeConstants? Instead of
>> loading AOTRuntimeConstants address and then adding offset we just
>> load address of constant.
>
> Yes, you are right again: if we register the address of every field in
> the struct in the SCC address table then that will simplify the back end
> code. I was not sure how much we wanted to rely on address table
> registration but given that we *are* reliant on it just now then we
> might as well have the benefit.
>
> So, we will do the extra registration and limit the back end changes to
> detect ConP occurrences in the AOTRuntimeConstants range and load them
> using an ExternalAddress reloc.
>
>> I don't understand question about C1 (I am not familiar with it). What
>> it generates for shifts now? Why you need 3-operand shift?
>
> I'm not sure '3-operand' is the right term to use to describe what is at
> stake here. It's really about the kind of operand rather than their
> number. The current region/card shift operations employ an immediate
> shift count. The AOT region/card shift operations need to be fed the
> shift count via a register (the target of the constant load).
>
> C1 does currently generate shifts where the shift count is in a register
> but it only does so using the flavours of shift (sar/sal/shr/shl) that
> require the register to be ECX. It doesn't currently generate the
> alternative instructions (sarx/shlx/shrx) that accept a shift count in
> any register. I'm not 100% sure why but I think this is because not all
> CPUs implement them (they require CPUID feature BM2) or it may be to do
> with them setting/not setting flags.
>
> Ashu is proposing adding support for these latter instructions to C1 so
> the barrier code does not have to pin the shift count to ECX. Do we
> care? Or can we live with the effect of pinning?
>
>> Code style: I noticed and aarch64.ad still using old stile of
>> instructions definition using `enc_class`. We don't do this any more
>> (at least in x86*.ad) when we adding new instructions with one
>> implementation. Please use `ins_encode %{ %}` in such cases.
>
> Sure, we will fix that.
>
>> I think, if you need additional register you should push/pop to
>> guarantee we don't stomp over some used value.
> Ok, we will play safe and do that.
>
> One final comment: Much of this discussion is going to be side-stepped
> by the G1 late barrier implementation that is currently working its way
> into mainline. That will remove the need to inject barrier IR subgraphs
> into the C1 and C2 IR graphs and eliminate most (all?) of the associated
> C2 reduction rules from the ad files. I think we should still update the
> patch to accommodate the feedback above. We can simplify it further once
> the G1 changes come in from mainline and pursue in parallel the idea of
> accommodating changes to oop encodings using this model.
>
> regards,
>
>
> Andrew Dinn
> -----------
>
More information about the leyden-dev
mailing list