RFR: 8267968: [PPC64] Use prefixed load and addi instructions for better performance in POWER10 [v3]

Wed Jun 9 14:10:16 UTC 2021

On Wed, 9 Jun 2021 11:24:33 GMT, Kazunori Ogata <ogatak at openjdk.org> wrote:

>> The POWER10 processor supports prefixed load and addi instructions that have larger displacement field of up to 34-bits. We can reduce instruction cycles to load constant from TOC and load an immediate value to a register.
>> 
>> Assembler::{load|add}_const_optimized() and LoadCon[LPFD]Nodes are modified to use prefixed instructions, with fixing other functions that are affected by this change.
>> 
>> I ran jtreg test on both POWER10 and POWER8 machines by using "make test-tier1" and verified no additional fails by this change. I also ran DaCapo, Renaissance, and SPECjbb2015 on both of them and verified they run successfully.
>
> Kazunori Ogata has updated the pull request incrementally with two additional commits since the last revision:
> 
>  - Revert changes for pusing nodes in loadConLNodesTuple and add comments about the node _last points to
>  - Fix grammatical errors in comments

Sorry for my late response. I was busy with other things. I've looked at this change for some time and I wonder if such a complex change should be done at all. I like the idea, but does it really improve performance for any real applications or benchmarks? At least those parts which only increase complexity should not get done.

src/hotspot/cpu/ppc/assembler_ppc.cpp line 352:

> 350: void Assembler::paddi_or_addi(Register d, Register s, long si34) {
> 351:   if (is_simm16(si34)) {
> 352:     addi_r0ok(d, s, (int)si34);

If r0 is ok, it should be named paddi_or_addi_r0ok and users should assert not to use r0 for a real addition.

src/hotspot/cpu/ppc/assembler_ppc.cpp line 364:

> 362:       // we avoid a buffer overrun in the actual code generation phase.
> 363:       nop();
> 364:     }

Scratch emit should be able to determine the size precisely, not just a pessimistic estimation. Please don't break this design.

src/hotspot/cpu/ppc/ppc.ad line 6400:

> 6398: 
> 6399:   format %{ "LFS     $dst, offset, $toc \t// load float $src from TOC" %}
> 6400:   size(8);

sizes should be precise.

-------------

PR: https://git.openjdk.java.net/jdk/pull/4267