RFR: 8322790: RISC-V: Tune costs for shuffles with no conversion

Vladimir Kempik vkempik at openjdk.org
Tue Jan 2 10:59:47 UTC 2024


On Tue, 2 Jan 2024 10:55:23 GMT, Fei Yang <fyang at openjdk.org> wrote:

>> Hi all, please review this small change to RISC-V nodes insertion costs.
>> Now we have several nodes which provide shuffles without conversion: https://github.com/openjdk/jdk/blob/32d80e2caf6063b58128bd5f3dc87b276f3bd0cb/src/hotspot/cpu/riscv/riscv.ad#L8525-L8741
>> On most RISC-V cpu`s we prefer reg<->reg operations, because they are faster, but now stack<->reg operations used (for details about reasons, please, visit connected jbs issue).
>> After changing insertion costs reg<->reg operations selected, and we can see performance improvements for benchmarks, which use such shuffles (tested on thead C910 board):
>> |              Benchmark              | Upstream build (ops/ms) | Patched build (ops/ms) | difference (%) |
>> |:-----------------------------------:|:-----------------------:|:----------------------:|:--------------:|
>> | MathBench.doubleToRawLongBitsDouble |        30935.139        |        32171.761       |      +4.00      |
>> |      StrictMathBench.ceilDouble     |        24682.810        |        29782.050       |      +20.66     |
>> |      StrictMathBench.cosDouble      |         6948.309        |        6938.276        |      -0.14     |
>> |      StrictMathBench.expDouble      |         6816.143        |        7211.021        |      +5.79      |
>> |     StrictMathBench.floorDouble     |        30699.630        |        34189.509       |      +11.37     |
>> |      StrictMathBench.maxDouble      |        35157.355        |        34675.191       |      -1.37     |
>> |      StrictMathBench.minDouble      |        35192.135        |        35183.015       |      -0.03     |
>> |      StrictMathBench.sinDouble      |         6698.405        |        6721.809        |      +0.35      |
>> 
>> New benchmark for changed nodes:
>> 
>> --- a/test/micro/org/openjdk/bench/java/lang/MathBench.java
>> +++ b/test/micro/org/openjdk/bench/java/lang/MathBench.java
>> @@ -540,4 +540,11 @@ public class MathBench {
>>          return  Math.ulp(float7);
>>      }
>>  
>> +    @Benchmark
>> +    public long doubleToRawLongBitsDouble() {
>> +        double dbl162Dot5 = double81 * 2.0d + double0Dot5;
>> +        double dbl3 = double2 + double1;
>> +        return Double.doubleToRawLongBits(dbl162Dot5) + Double.doubleToRawLongBits(dbl3);
>> +    }
>> +
>
> src/hotspot/cpu/riscv/riscv.ad line 8534:
> 
>> 8532:   effect(DEF dst, USE src);
>> 8533: 
>> 8534:   ins_cost(ALU_COST + LOAD_COST);
> 
> Adding an extra cost of `ALU_COST` for these load/store nodes looks a bit strange to me. I suppose only lowering the cost for those fmv.x.w/fmv.w.x/fmv.x.d/fmv.d.x nodes will do?

those nodes need to go below 100 which then starts looking ugly

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/17206#discussion_r1439342747


More information about the hotspot-compiler-dev mailing list