[jdk21u-dev] RFR: 8320379: C2: Sort spilling/unspilling sequence for better ld/st merging into ldp/stp on AArch64
Xiaolong Peng
duke at openjdk.org
Wed Jun 12 16:33:23 UTC 2024
On Wed, 12 Jun 2024 09:07:32 GMT, Andrew Haley <aph at openjdk.org> wrote:
>>> > In other words, on this trivial workload, this saves us ~0.15% of code cache. If this holds for larger apps, this amounts at about 192K for 128M code cache. This looks like a benefit enough for the cost of this backport :) It would, of course, impact even higher when we miss `ldp` optimization somewhere near heavy spills on a hot path.
>>>
>>> So this is the kind of argument that should have been made.
>>
>> True. Are you satisfied with it?
>>
>>> It's not quite obvious that it'll help a hot path, though. ARMv8.4 made a change that makes LDP and STP atomic in some cases (see "Changes to single-copy atomicity in Armv8.4"). While this is very useful, it is a behavioural change. I'm guessing that there won't be any regressions for spills and fills, because stack-local accesses are uncontended.
>>
>> Yup. I expect that if LDP/STP merging becomes problematic on some platforms, the merging code would somehow account for that. This seems orthogonal to what this patch is doing.
>
>> > > In other words, on this trivial workload, this saves us ~0.15% of code cache. If this holds for larger apps, this amounts at about 192K for 128M code cache. This looks like a benefit enough for the cost of this backport :) It would, of course, impact even higher when we miss `ldp` optimization somewhere near heavy spills on a hot path.
>> >
>> > So this is the kind of argument that should have been made.
>>
>> True. Are you satisfied with it?
>
> Yes, I think so. It's remarkably effective for such a simple optimization.
Thank you @theRealAph and @shipilev for the review/discussion/argument, it is a good learning! I"ll start to integrate.
-------------
PR Comment: https://git.openjdk.org/jdk21u-dev/pull/665#issuecomment-2163461573
More information about the jdk-updates-dev
mailing list