RFR: 8297359: RISC-V: improve performance of floating Max Min intrinsics

Vladimir Kempik vkempik at openjdk.org
Wed Nov 23 15:30:21 UTC 2022


Please review this change.

It improves performance of Math.min/max intrinsics for Floats and Doubles.

The main issue in these intrinsics is the requirement to return NaN if any of arguments is NaN. In risc-v, fmin/fmax returns NaN only if both of src registers are NaN ( quiet NaN).
That requires additional logic to handle the case where only of of src is NaN.

Here the postcheck with flt (floating less than comparision) and flags analysis replaced with precheck. The precheck is done with 2 fclass on both src then checking combined ( by or-in) result, if one of src is NaN then put the NaN into dst ( using fadd dst, src1, src2).

Microbench results:

The results on the thead c910:
before

Benchmark                      Mode  Cnt      Score     Error  Units
FpMinMaxIntrinsics.dMax        avgt   25  53752.831 ±  97.198  ns/op
FpMinMaxIntrinsics.dMin        avgt   25  53707.229 ± 177.559  ns/op
FpMinMaxIntrinsics.dMinReduce  avgt   25  42805.985 ±   9.901  ns/op
FpMinMaxIntrinsics.fMax        avgt   25  53449.568 ± 215.294  ns/op
FpMinMaxIntrinsics.fMin        avgt   25  53504.106 ± 180.833  ns/op
FpMinMaxIntrinsics.fMinReduce  avgt   25  42794.579 ±   7.013  ns/op
MaxMinOptimizeTest.dAdd        avgt   25    381.138 ±   5.692  us/op
MaxMinOptimizeTest.dMax        avgt   25   4575.094 ±  17.065  us/op
MaxMinOptimizeTest.dMin        avgt   25   4584.648 ±  18.561  us/op
MaxMinOptimizeTest.dMul        avgt   25    384.615 ±   7.751  us/op
MaxMinOptimizeTest.fAdd        avgt   25    318.076 ±   3.308  us/op
MaxMinOptimizeTest.fMax        avgt   25   4405.724 ±  20.353  us/op
MaxMinOptimizeTest.fMin        avgt   25   4421.652 ±  18.029  us/op
MaxMinOptimizeTest.fMul        avgt   25    305.462 ±  19.437  us/op

after
Benchmark                      Mode  Cnt      Score    Error  Units
FpMinMaxIntrinsics.dMax        avgt   25  10712.246 ±  5.607  ns/op
FpMinMaxIntrinsics.dMin        avgt   25  10732.655 ± 41.894  ns/op
FpMinMaxIntrinsics.dMinReduce  avgt   25   3248.106 ±  2.143  ns/op
FpMinMaxIntrinsics.fMax        avgt   25  10707.084 ±  3.276  ns/op
FpMinMaxIntrinsics.fMin        avgt   25  10719.771 ± 14.864  ns/op
FpMinMaxIntrinsics.fMinReduce  avgt   25   3274.775 ±  0.996  ns/op
MaxMinOptimizeTest.dAdd        avgt   25    383.720 ±  8.849  us/op
MaxMinOptimizeTest.dMax        avgt   25    429.345 ± 11.160  us/op
MaxMinOptimizeTest.dMin        avgt   25    439.980 ±  3.757  us/op
MaxMinOptimizeTest.dMul        avgt   25    390.126 ± 10.258  us/op
MaxMinOptimizeTest.fAdd        avgt   25    300.005 ± 18.206  us/op
MaxMinOptimizeTest.fMax        avgt   25    370.467 ±  6.054  us/op
MaxMinOptimizeTest.fMin        avgt   25    375.134 ±  4.568  us/op
MaxMinOptimizeTest.fMul        avgt   25    305.344 ± 18.307  us/op

hifive umatched

before

Benchmark                      Mode  Cnt      Score    Error  Units
FpMinMaxIntrinsics.dMax        avgt   25  30234.224 ± 16.744  ns/op
FpMinMaxIntrinsics.dMin        avgt   25  30227.686 ± 15.389  ns/op
FpMinMaxIntrinsics.dMinReduce  avgt   25  15766.749 ±  3.724  ns/op
FpMinMaxIntrinsics.fMax        avgt   25  30140.092 ± 10.243  ns/op
FpMinMaxIntrinsics.fMin        avgt   25  30149.470 ± 34.041  ns/op
FpMinMaxIntrinsics.fMinReduce  avgt   25  15760.770 ±  5.415  ns/op
MaxMinOptimizeTest.dAdd        avgt   25   1155.234 ±  4.603  us/op
MaxMinOptimizeTest.dMax        avgt   25   2597.897 ±  3.307  us/op
MaxMinOptimizeTest.dMin        avgt   25   2599.183 ±  3.806  us/op
MaxMinOptimizeTest.dMul        avgt   25   1155.281 ±  1.813  us/op
MaxMinOptimizeTest.fAdd        avgt   25    750.967 ±  7.254  us/op
MaxMinOptimizeTest.fMax        avgt   25   2305.085 ±  1.556  us/op
MaxMinOptimizeTest.fMin        avgt   25   2305.306 ±  1.478  us/op
MaxMinOptimizeTest.fMul        avgt   25    750.623 ±  7.357  us/op

2fclass_new

Benchmark                      Mode  Cnt      Score    Error  Units
FpMinMaxIntrinsics.dMax        avgt   25  23599.547 ± 29.571  ns/op
FpMinMaxIntrinsics.dMin        avgt   25  23593.236 ± 18.456  ns/op
FpMinMaxIntrinsics.dMinReduce  avgt   25   8630.201 ±  1.353  ns/op
FpMinMaxIntrinsics.fMax        avgt   25  23496.337 ± 18.340  ns/op
FpMinMaxIntrinsics.fMin        avgt   25  23477.881 ±  8.545  ns/op
FpMinMaxIntrinsics.fMinReduce  avgt   25   8629.135 ±  0.869  ns/op
MaxMinOptimizeTest.dAdd        avgt   25   1155.479 ±  4.938  us/op
MaxMinOptimizeTest.dMax        avgt   25   1560.323 ±  3.077  us/op
MaxMinOptimizeTest.dMin        avgt   25   1558.668 ±  2.421  us/op
MaxMinOptimizeTest.dMul        avgt   25   1154.919 ±  2.077  us/op
MaxMinOptimizeTest.fAdd        avgt   25    751.325 ±  7.169  us/op
MaxMinOptimizeTest.fMax        avgt   25   1306.131 ±  1.102  us/op
MaxMinOptimizeTest.fMin        avgt   25   1306.134 ±  0.957  us/op
MaxMinOptimizeTest.fMul        avgt   25    750.968 ±  7.334  us/op

-------------

Commit messages:
 - updated version of 2fclass minmax

Changes: https://git.openjdk.org/jdk/pull/11327/files
 Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=11327&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8297359
  Stats: 33 lines in 2 files changed: 12 ins; 11 del; 10 mod
  Patch: https://git.openjdk.org/jdk/pull/11327.diff
  Fetch: git fetch https://git.openjdk.org/jdk pull/11327/head:pull/11327

PR: https://git.openjdk.org/jdk/pull/11327


More information about the hotspot-compiler-dev mailing list