RFR: 8297359: RISC-V: improve performance of floating Max Min intrinsics
Vladimir Kempik
vkempik at openjdk.org
Wed Nov 23 15:30:21 UTC 2022
Please review this change.
It improves performance of Math.min/max intrinsics for Floats and Doubles.
The main issue in these intrinsics is the requirement to return NaN if any of arguments is NaN. In risc-v, fmin/fmax returns NaN only if both of src registers are NaN ( quiet NaN).
That requires additional logic to handle the case where only of of src is NaN.
Here the postcheck with flt (floating less than comparision) and flags analysis replaced with precheck. The precheck is done with 2 fclass on both src then checking combined ( by or-in) result, if one of src is NaN then put the NaN into dst ( using fadd dst, src1, src2).
Microbench results:
The results on the thead c910:
before
Benchmark Mode Cnt Score Error Units
FpMinMaxIntrinsics.dMax avgt 25 53752.831 ± 97.198 ns/op
FpMinMaxIntrinsics.dMin avgt 25 53707.229 ± 177.559 ns/op
FpMinMaxIntrinsics.dMinReduce avgt 25 42805.985 ± 9.901 ns/op
FpMinMaxIntrinsics.fMax avgt 25 53449.568 ± 215.294 ns/op
FpMinMaxIntrinsics.fMin avgt 25 53504.106 ± 180.833 ns/op
FpMinMaxIntrinsics.fMinReduce avgt 25 42794.579 ± 7.013 ns/op
MaxMinOptimizeTest.dAdd avgt 25 381.138 ± 5.692 us/op
MaxMinOptimizeTest.dMax avgt 25 4575.094 ± 17.065 us/op
MaxMinOptimizeTest.dMin avgt 25 4584.648 ± 18.561 us/op
MaxMinOptimizeTest.dMul avgt 25 384.615 ± 7.751 us/op
MaxMinOptimizeTest.fAdd avgt 25 318.076 ± 3.308 us/op
MaxMinOptimizeTest.fMax avgt 25 4405.724 ± 20.353 us/op
MaxMinOptimizeTest.fMin avgt 25 4421.652 ± 18.029 us/op
MaxMinOptimizeTest.fMul avgt 25 305.462 ± 19.437 us/op
after
Benchmark Mode Cnt Score Error Units
FpMinMaxIntrinsics.dMax avgt 25 10712.246 ± 5.607 ns/op
FpMinMaxIntrinsics.dMin avgt 25 10732.655 ± 41.894 ns/op
FpMinMaxIntrinsics.dMinReduce avgt 25 3248.106 ± 2.143 ns/op
FpMinMaxIntrinsics.fMax avgt 25 10707.084 ± 3.276 ns/op
FpMinMaxIntrinsics.fMin avgt 25 10719.771 ± 14.864 ns/op
FpMinMaxIntrinsics.fMinReduce avgt 25 3274.775 ± 0.996 ns/op
MaxMinOptimizeTest.dAdd avgt 25 383.720 ± 8.849 us/op
MaxMinOptimizeTest.dMax avgt 25 429.345 ± 11.160 us/op
MaxMinOptimizeTest.dMin avgt 25 439.980 ± 3.757 us/op
MaxMinOptimizeTest.dMul avgt 25 390.126 ± 10.258 us/op
MaxMinOptimizeTest.fAdd avgt 25 300.005 ± 18.206 us/op
MaxMinOptimizeTest.fMax avgt 25 370.467 ± 6.054 us/op
MaxMinOptimizeTest.fMin avgt 25 375.134 ± 4.568 us/op
MaxMinOptimizeTest.fMul avgt 25 305.344 ± 18.307 us/op
hifive umatched
before
Benchmark Mode Cnt Score Error Units
FpMinMaxIntrinsics.dMax avgt 25 30234.224 ± 16.744 ns/op
FpMinMaxIntrinsics.dMin avgt 25 30227.686 ± 15.389 ns/op
FpMinMaxIntrinsics.dMinReduce avgt 25 15766.749 ± 3.724 ns/op
FpMinMaxIntrinsics.fMax avgt 25 30140.092 ± 10.243 ns/op
FpMinMaxIntrinsics.fMin avgt 25 30149.470 ± 34.041 ns/op
FpMinMaxIntrinsics.fMinReduce avgt 25 15760.770 ± 5.415 ns/op
MaxMinOptimizeTest.dAdd avgt 25 1155.234 ± 4.603 us/op
MaxMinOptimizeTest.dMax avgt 25 2597.897 ± 3.307 us/op
MaxMinOptimizeTest.dMin avgt 25 2599.183 ± 3.806 us/op
MaxMinOptimizeTest.dMul avgt 25 1155.281 ± 1.813 us/op
MaxMinOptimizeTest.fAdd avgt 25 750.967 ± 7.254 us/op
MaxMinOptimizeTest.fMax avgt 25 2305.085 ± 1.556 us/op
MaxMinOptimizeTest.fMin avgt 25 2305.306 ± 1.478 us/op
MaxMinOptimizeTest.fMul avgt 25 750.623 ± 7.357 us/op
2fclass_new
Benchmark Mode Cnt Score Error Units
FpMinMaxIntrinsics.dMax avgt 25 23599.547 ± 29.571 ns/op
FpMinMaxIntrinsics.dMin avgt 25 23593.236 ± 18.456 ns/op
FpMinMaxIntrinsics.dMinReduce avgt 25 8630.201 ± 1.353 ns/op
FpMinMaxIntrinsics.fMax avgt 25 23496.337 ± 18.340 ns/op
FpMinMaxIntrinsics.fMin avgt 25 23477.881 ± 8.545 ns/op
FpMinMaxIntrinsics.fMinReduce avgt 25 8629.135 ± 0.869 ns/op
MaxMinOptimizeTest.dAdd avgt 25 1155.479 ± 4.938 us/op
MaxMinOptimizeTest.dMax avgt 25 1560.323 ± 3.077 us/op
MaxMinOptimizeTest.dMin avgt 25 1558.668 ± 2.421 us/op
MaxMinOptimizeTest.dMul avgt 25 1154.919 ± 2.077 us/op
MaxMinOptimizeTest.fAdd avgt 25 751.325 ± 7.169 us/op
MaxMinOptimizeTest.fMax avgt 25 1306.131 ± 1.102 us/op
MaxMinOptimizeTest.fMin avgt 25 1306.134 ± 0.957 us/op
MaxMinOptimizeTest.fMul avgt 25 750.968 ± 7.334 us/op
-------------
Commit messages:
- updated version of 2fclass minmax
Changes: https://git.openjdk.org/jdk/pull/11327/files
Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=11327&range=00
Issue: https://bugs.openjdk.org/browse/JDK-8297359
Stats: 33 lines in 2 files changed: 12 ins; 11 del; 10 mod
Patch: https://git.openjdk.org/jdk/pull/11327.diff
Fetch: git fetch https://git.openjdk.org/jdk pull/11327/head:pull/11327
PR: https://git.openjdk.org/jdk/pull/11327
More information about the hotspot-compiler-dev
mailing list