RFR: 8267968: [PPC64] Use prefixed load and addi instructions for better performance in POWER10 [v3]
Martin Doerr
mdoerr at openjdk.java.net
Wed Jun 9 14:10:16 UTC 2021
On Wed, 9 Jun 2021 11:24:33 GMT, Kazunori Ogata <ogatak at openjdk.org> wrote:
>> The POWER10 processor supports prefixed load and addi instructions that have larger displacement field of up to 34-bits. We can reduce instruction cycles to load constant from TOC and load an immediate value to a register.
>>
>> Assembler::{load|add}_const_optimized() and LoadCon[LPFD]Nodes are modified to use prefixed instructions, with fixing other functions that are affected by this change.
>>
>> I ran jtreg test on both POWER10 and POWER8 machines by using "make test-tier1" and verified no additional fails by this change. I also ran DaCapo, Renaissance, and SPECjbb2015 on both of them and verified they run successfully.
>
> Kazunori Ogata has updated the pull request incrementally with two additional commits since the last revision:
>
> - Revert changes for pusing nodes in loadConLNodesTuple and add comments about the node _last points to
> - Fix grammatical errors in comments
Sorry for my late response. I was busy with other things. I've looked at this change for some time and I wonder if such a complex change should be done at all. I like the idea, but does it really improve performance for any real applications or benchmarks? At least those parts which only increase complexity should not get done.
src/hotspot/cpu/ppc/assembler_ppc.cpp line 352:
> 350: void Assembler::paddi_or_addi(Register d, Register s, long si34) {
> 351: if (is_simm16(si34)) {
> 352: addi_r0ok(d, s, (int)si34);
If r0 is ok, it should be named paddi_or_addi_r0ok and users should assert not to use r0 for a real addition.
src/hotspot/cpu/ppc/assembler_ppc.cpp line 364:
> 362: // we avoid a buffer overrun in the actual code generation phase.
> 363: nop();
> 364: }
Scratch emit should be able to determine the size precisely, not just a pessimistic estimation. Please don't break this design.
src/hotspot/cpu/ppc/ppc.ad line 6400:
> 6398:
> 6399: format %{ "LFS $dst, offset, $toc \t// load float $src from TOC" %}
> 6400: size(8);
sizes should be precise.
-------------
PR: https://git.openjdk.java.net/jdk/pull/4267
More information about the hotspot-dev
mailing list