RFR: 8267968: [PPC64] Use prefixed load and addi instructions for better performance in POWER10 [v4]
Martin Doerr
mdoerr at openjdk.java.net
Thu Jun 10 10:28:22 UTC 2021
On Thu, 10 Jun 2021 07:33:38 GMT, Kazunori Ogata <ogatak at openjdk.org> wrote:
>> The POWER10 processor supports prefixed load and addi instructions that have larger displacement field of up to 34-bits. We can reduce instruction cycles to load constant from TOC and load an immediate value to a register.
>>
>> Assembler::{load|add}_const_optimized() and LoadCon[LPFD]Nodes are modified to use prefixed instructions, with fixing other functions that are affected by this change.
>>
>> I ran jtreg test on both POWER10 and POWER8 machines by using "make test-tier1" and verified no additional fails by this change. I also ran DaCapo, Renaissance, and SPECjbb2015 on both of them and verified they run successfully.
>
> Kazunori Ogata has updated the pull request incrementally with two additional commits since the last revision:
>
> - Rename paddi_or_addi to paddi_or_addi_r0ok because it accepts R0
> - Remove unreachable code blocks
Thanks for postponing it. We should have nightly test on Power10 when integrating complex changes.
The independent parts of this change should get evaluated individually. I'm not sure if optimizing load_const_optimized this way is beneficial at all. It doesn't reduce code size AFAICS. And the latency reduction may be pointless if Power10 strongly uses out-of-order execution which can hide the latency. We should also check how relevant large constant sections are.
-------------
PR: https://git.openjdk.java.net/jdk/pull/4267
More information about the hotspot-dev
mailing list