RFR: 8267968: [PPC64] Use prefixed load and addi instructions for better performance in POWER10 [v2]
Corey Ashford
cashford at openjdk.java.net
Mon Jun 7 23:39:23 UTC 2021
On Sun, 6 Jun 2021 20:28:27 GMT, Kazunori Ogata <ogatak at openjdk.org> wrote:
>> The POWER10 processor supports prefixed load and addi instructions that have larger displacement field of up to 34-bits. We can reduce instruction cycles to load constant from TOC and load an immediate value to a register.
>>
>> Assembler::{load|add}_const_optimized() and LoadCon[LPFD]Nodes are modified to use prefixed instructions, with fixing other functions that are affected by this change.
>>
>> I ran jtreg test on both POWER10 and POWER8 machines by using "make test-tier1" and verified no additional fails by this change. I also ran DaCapo, Renaissance, and SPECjbb2015 on both of them and verified they run successfully.
>
> Kazunori Ogata has updated the pull request incrementally with one additional commit since the last revision:
>
> Improve comments in macroAssembler_ppc.cpp
I didn't review the details of the commit's functionality, because there are hundreds of details to check there, and to be honest there's a lot I don't understand about working with C2.
Do you have a set of tests that check different sizes of immediate loads to guarantee you hit every case and emit the correct code?
src/hotspot/cpu/ppc/assembler_ppc.cpp line 359:
> 357: code_section()->scratch_emit()) {
> 358: // Always emit a nop if the target is a scratch buffer, otherwise fill_buffer() may raise
> 359: // an assertion failure because the size of actually generated code can be larger than that
size of the* actual* generated code
src/hotspot/cpu/ppc/assembler_ppc.cpp line 360:
> 358: // Always emit a nop if the target is a scratch buffer, otherwise fill_buffer() may raise
> 359: // an assertion failure because the size of actually generated code can be larger than that
> 360: // in scratch_emit phase. A difference of code buffer addresses for the two phases can result
in the* scratch_emit phase.
src/hotspot/cpu/ppc/assembler_ppc.cpp line 362:
> 360: // in scratch_emit phase. A difference of code buffer addresses for the two phases can result
> 361: // in different number of nops for alignment. By emitting a nop before every paddi, we avoid
> 362: // buffer overrun in acrual code generation phase.
a* buffer overrun in the* acrual->actual* code generation phase.
src/hotspot/cpu/ppc/assembler_ppc.cpp line 396:
> 394:
> 395: // pli can require a nop for alignement depending on the code address, so we don't use pli
> 396: // when the caller expects the number of generated code is always the same.
the amount* of generated code ...
or
the size* of the* generated code ...
src/hotspot/cpu/ppc/assembler_ppc.cpp line 454:
> 452: if (xd) { ori( d, d, (unsigned short)xd); }
> 453: } else {
> 454: // Exploit instruction level parallelism if we have a tmp register.
instruction-level (hyphenated)
src/hotspot/cpu/ppc/assembler_ppc.cpp line 600:
> 598: // Case 3: Can use paddi. (However, paddi can require a nop for alignement depending
> 599: // on the code address, so we don't use paddi when the caller
> 600: // expects the number of generated code is always the same.
same comment as earlier about "number" vs. amount or size
src/hotspot/cpu/ppc/ppc.ad line 6042:
> 6040: // costs do not prevent matching in this case. For that reason the
> 6041: // operand immL_NM with predicate(false) is used.
> 6042: // On Power 10 and up, this instruction is also used for larger offset upto signed 32-bit.
larger offsets*
src/hotspot/cpu/ppc/ppc.ad line 6327:
> 6325: // costs do not prevent matching in this case. For that reason the
> 6326: // operand immP_NM with predicate(false) is used.
> 6327: // On Power 10 and up, this instruction is also used for larger offset upto signed 32-bit.
offsets*
src/hotspot/cpu/ppc/ppc.ad line 6397:
> 6395: // costs do not prevent matching in this case. For that reason the
> 6396: // operand immF_NM with predicate(false) is used.
> 6397: // On Power 10 and up, this instruction is also used for larger offset upto signed 32-bit.
offsets*
src/hotspot/cpu/ppc/ppc.ad line 6472:
> 6470: // costs do not prevent matching in this case. For that reason the
> 6471: // operand immD_NM with predicate(false) is used.
> 6472: // On Power 10 and up, this instruction is also used for larger offset upto signed 32-bit.
offsets*
-------------
Changes requested by cashford (Author).
PR: https://git.openjdk.java.net/jdk/pull/4267
More information about the hotspot-dev
mailing list