RFR: 8255553: [PPC64] Introduce and use setbc and setnbc P10 instructions

Martin Doerr mdoerr at openjdk.java.net
Thu Oct 29 11:25:43 UTC 2020


On Wed, 28 Oct 2020 17:00:43 GMT, Ziviani <github.com+670087+jrziviani at openjdk.org> wrote:

> - setbc RT,BI: sets RT to 1 if CR(BI) is 1, otherwise 0.
> - setnbc RT,BI: sets RT to -1 if CR(BI) is 1, otherwise 0.
> Ref: PowerISA 3.1, page 129.
> 
> These instructions are particularly interesting to improve the following
> pattern `(src1<src2)? -1: ((src1>src2)? 1: 0)`, which can be found in
> `instruct cmpL3_reg_reg_ExEx()@ppc.ad`, by removing its branches.
> 
> Long.toString, that generate such pattern in getChars, has showed a
> good performance gain by using these new instructions.
> 
> Example:
> for (int i = 0; i < 200_000; i++)
>   res = Long.toString((long)i);
> 
> java -Xcomp -XX:CompileThreshold=1 -XX:-TieredCompilation TestToString
> 
> Without setbc (average): 0.1178 seconds
> With setbc (average): 0.0396 seconds

Hi Jose,
thanks for improving this. Looks correct. I have some ideas to share:
Note that it's also possible to implement it branch free for < Power10: See LIR_Assembler::comp_fl2i in c1_LIRAssembler_ppc.cpp.
This could also be used for C2 with flagsRegCR0. Maybe you would like to clean up the existing C2 code and remove the old cmovI_conIvalueMinus1_conIvalue0_conIvalue1_Ex and cmovI_conIvalueMinus1_conIvalue1?
You could also optimize C1 and the template interpreter (TemplateTable::lcmp + float_cmp, but interpreter is not so critical) for Power10.
But we can also just take your C2 improvement for Power10 if you don't have time for additional parts.

-------------

PR: https://git.openjdk.java.net/jdk/pull/907


More information about the hotspot-compiler-dev mailing list