RFR: 8297359: RISC-V: improve performance of floating Max Min intrinsics

Vladimir Kempik vkempik at openjdk.org
Fri Nov 25 08:03:59 UTC 2022


On Wed, 23 Nov 2022 15:20:47 GMT, Vladimir Kempik <vkempik at openjdk.org> wrote:

> Please review this change.
> 
> It improves performance of Math.min/max intrinsics for Floats and Doubles.
> 
> The main issue in these intrinsics is the requirement to return NaN if any of arguments is NaN. In risc-v, fmin/fmax returns NaN only if both of src registers are NaN ( quiet NaN).
> That requires additional logic to handle the case where only of of src is NaN.
> 
> Here the postcheck with flt (floating less than comparision) and flags analysis replaced with precheck. The precheck is done with 2 fclass on both src then checking combined ( by or-in) result, if one of src is NaN then put the NaN into dst ( using fadd dst, src1, src2).
> 
> Microbench results:
> 
> The results on the thead c910:
> before
> 
> Benchmark                      Mode  Cnt      Score     Error  Units
> FpMinMaxIntrinsics.dMax        avgt   25  53752.831 ±  97.198  ns/op
> FpMinMaxIntrinsics.dMin        avgt   25  53707.229 ± 177.559  ns/op
> FpMinMaxIntrinsics.dMinReduce  avgt   25  42805.985 ±   9.901  ns/op
> FpMinMaxIntrinsics.fMax        avgt   25  53449.568 ± 215.294  ns/op
> FpMinMaxIntrinsics.fMin        avgt   25  53504.106 ± 180.833  ns/op
> FpMinMaxIntrinsics.fMinReduce  avgt   25  42794.579 ±   7.013  ns/op
> MaxMinOptimizeTest.dAdd        avgt   25    381.138 ±   5.692  us/op
> MaxMinOptimizeTest.dMax        avgt   25   4575.094 ±  17.065  us/op
> MaxMinOptimizeTest.dMin        avgt   25   4584.648 ±  18.561  us/op
> MaxMinOptimizeTest.dMul        avgt   25    384.615 ±   7.751  us/op
> MaxMinOptimizeTest.fAdd        avgt   25    318.076 ±   3.308  us/op
> MaxMinOptimizeTest.fMax        avgt   25   4405.724 ±  20.353  us/op
> MaxMinOptimizeTest.fMin        avgt   25   4421.652 ±  18.029  us/op
> MaxMinOptimizeTest.fMul        avgt   25    305.462 ±  19.437  us/op
> 
> after
> Benchmark                      Mode  Cnt      Score    Error  Units
> FpMinMaxIntrinsics.dMax        avgt   25  10712.246 ±  5.607  ns/op
> FpMinMaxIntrinsics.dMin        avgt   25  10732.655 ± 41.894  ns/op
> FpMinMaxIntrinsics.dMinReduce  avgt   25   3248.106 ±  2.143  ns/op
> FpMinMaxIntrinsics.fMax        avgt   25  10707.084 ±  3.276  ns/op
> FpMinMaxIntrinsics.fMin        avgt   25  10719.771 ± 14.864  ns/op
> FpMinMaxIntrinsics.fMinReduce  avgt   25   3274.775 ±  0.996  ns/op
> MaxMinOptimizeTest.dAdd        avgt   25    383.720 ±  8.849  us/op
> MaxMinOptimizeTest.dMax        avgt   25    429.345 ± 11.160  us/op
> MaxMinOptimizeTest.dMin        avgt   25    439.980 ±  3.757  us/op
> MaxMinOptimizeTest.dMul        avgt   25    390.126 ± 10.258  us/op
> MaxMinOptimizeTest.fAdd        avgt   25    300.005 ± 18.206  us/op
> MaxMinOptimizeTest.fMax        avgt   25    370.467 ±  6.054  us/op
> MaxMinOptimizeTest.fMin        avgt   25    375.134 ±  4.568  us/op
> MaxMinOptimizeTest.fMul        avgt   25    305.344 ± 18.307  us/op
> 
> hifive umatched
> 
> before
> 
> Benchmark                      Mode  Cnt      Score    Error  Units
> FpMinMaxIntrinsics.dMax        avgt   25  30234.224 ± 16.744  ns/op
> FpMinMaxIntrinsics.dMin        avgt   25  30227.686 ± 15.389  ns/op
> FpMinMaxIntrinsics.dMinReduce  avgt   25  15766.749 ±  3.724  ns/op
> FpMinMaxIntrinsics.fMax        avgt   25  30140.092 ± 10.243  ns/op
> FpMinMaxIntrinsics.fMin        avgt   25  30149.470 ± 34.041  ns/op
> FpMinMaxIntrinsics.fMinReduce  avgt   25  15760.770 ±  5.415  ns/op
> MaxMinOptimizeTest.dAdd        avgt   25   1155.234 ±  4.603  us/op
> MaxMinOptimizeTest.dMax        avgt   25   2597.897 ±  3.307  us/op
> MaxMinOptimizeTest.dMin        avgt   25   2599.183 ±  3.806  us/op
> MaxMinOptimizeTest.dMul        avgt   25   1155.281 ±  1.813  us/op
> MaxMinOptimizeTest.fAdd        avgt   25    750.967 ±  7.254  us/op
> MaxMinOptimizeTest.fMax        avgt   25   2305.085 ±  1.556  us/op
> MaxMinOptimizeTest.fMin        avgt   25   2305.306 ±  1.478  us/op
> MaxMinOptimizeTest.fMul        avgt   25    750.623 ±  7.357  us/op
> 
> 2fclass_new
> 
> Benchmark                      Mode  Cnt      Score    Error  Units
> FpMinMaxIntrinsics.dMax        avgt   25  23599.547 ± 29.571  ns/op
> FpMinMaxIntrinsics.dMin        avgt   25  23593.236 ± 18.456  ns/op
> FpMinMaxIntrinsics.dMinReduce  avgt   25   8630.201 ±  1.353  ns/op
> FpMinMaxIntrinsics.fMax        avgt   25  23496.337 ± 18.340  ns/op
> FpMinMaxIntrinsics.fMin        avgt   25  23477.881 ±  8.545  ns/op
> FpMinMaxIntrinsics.fMinReduce  avgt   25   8629.135 ±  0.869  ns/op
> MaxMinOptimizeTest.dAdd        avgt   25   1155.479 ±  4.938  us/op
> MaxMinOptimizeTest.dMax        avgt   25   1560.323 ±  3.077  us/op
> MaxMinOptimizeTest.dMin        avgt   25   1558.668 ±  2.421  us/op
> MaxMinOptimizeTest.dMul        avgt   25   1154.919 ±  2.077  us/op
> MaxMinOptimizeTest.fAdd        avgt   25    751.325 ±  7.169  us/op
> MaxMinOptimizeTest.fMax        avgt   25   1306.131 ±  1.102  us/op
> MaxMinOptimizeTest.fMin        avgt   25   1306.134 ±  0.957  us/op
> MaxMinOptimizeTest.fMul        avgt   25    750.968 ±  7.334  us/op

Thanks for looking at it.
Tier1 is fine, running the rest

-------------

PR: https://git.openjdk.org/jdk/pull/11327


More information about the hotspot-compiler-dev mailing list