RFR: 8297359: RISC-V: improve performance of floating Max Min intrinsics
Vladimir Kempik
vkempik at openjdk.org
Sat Nov 26 16:20:11 UTC 2022
On Wed, 23 Nov 2022 15:20:47 GMT, Vladimir Kempik <vkempik at openjdk.org> wrote:
> Please review this change.
>
> It improves performance of Math.min/max intrinsics for Floats and Doubles.
>
> The main issue in these intrinsics is the requirement to return NaN if any of arguments is NaN. In risc-v, fmin/fmax returns NaN only if both of src registers are NaN ( quiet NaN).
> That requires additional logic to handle the case where only of of src is NaN.
>
> Here the postcheck with flt (floating less than comparision) and flags analysis replaced with precheck. The precheck is done with 2 fclass on both src then checking combined ( by or-in) result, if one of src is NaN then put the NaN into dst ( using fadd dst, src1, src2).
>
> Microbench results:
>
> The results on the thead c910:
> before
>
> Benchmark Mode Cnt Score Error Units
> FpMinMaxIntrinsics.dMax avgt 25 53752.831 ± 97.198 ns/op
> FpMinMaxIntrinsics.dMin avgt 25 53707.229 ± 177.559 ns/op
> FpMinMaxIntrinsics.dMinReduce avgt 25 42805.985 ± 9.901 ns/op
> FpMinMaxIntrinsics.fMax avgt 25 53449.568 ± 215.294 ns/op
> FpMinMaxIntrinsics.fMin avgt 25 53504.106 ± 180.833 ns/op
> FpMinMaxIntrinsics.fMinReduce avgt 25 42794.579 ± 7.013 ns/op
> MaxMinOptimizeTest.dAdd avgt 25 381.138 ± 5.692 us/op
> MaxMinOptimizeTest.dMax avgt 25 4575.094 ± 17.065 us/op
> MaxMinOptimizeTest.dMin avgt 25 4584.648 ± 18.561 us/op
> MaxMinOptimizeTest.dMul avgt 25 384.615 ± 7.751 us/op
> MaxMinOptimizeTest.fAdd avgt 25 318.076 ± 3.308 us/op
> MaxMinOptimizeTest.fMax avgt 25 4405.724 ± 20.353 us/op
> MaxMinOptimizeTest.fMin avgt 25 4421.652 ± 18.029 us/op
> MaxMinOptimizeTest.fMul avgt 25 305.462 ± 19.437 us/op
>
> after
> Benchmark Mode Cnt Score Error Units
> FpMinMaxIntrinsics.dMax avgt 25 10712.246 ± 5.607 ns/op
> FpMinMaxIntrinsics.dMin avgt 25 10732.655 ± 41.894 ns/op
> FpMinMaxIntrinsics.dMinReduce avgt 25 3248.106 ± 2.143 ns/op
> FpMinMaxIntrinsics.fMax avgt 25 10707.084 ± 3.276 ns/op
> FpMinMaxIntrinsics.fMin avgt 25 10719.771 ± 14.864 ns/op
> FpMinMaxIntrinsics.fMinReduce avgt 25 3274.775 ± 0.996 ns/op
> MaxMinOptimizeTest.dAdd avgt 25 383.720 ± 8.849 us/op
> MaxMinOptimizeTest.dMax avgt 25 429.345 ± 11.160 us/op
> MaxMinOptimizeTest.dMin avgt 25 439.980 ± 3.757 us/op
> MaxMinOptimizeTest.dMul avgt 25 390.126 ± 10.258 us/op
> MaxMinOptimizeTest.fAdd avgt 25 300.005 ± 18.206 us/op
> MaxMinOptimizeTest.fMax avgt 25 370.467 ± 6.054 us/op
> MaxMinOptimizeTest.fMin avgt 25 375.134 ± 4.568 us/op
> MaxMinOptimizeTest.fMul avgt 25 305.344 ± 18.307 us/op
>
> hifive umatched
>
> before
>
> Benchmark Mode Cnt Score Error Units
> FpMinMaxIntrinsics.dMax avgt 25 30234.224 ± 16.744 ns/op
> FpMinMaxIntrinsics.dMin avgt 25 30227.686 ± 15.389 ns/op
> FpMinMaxIntrinsics.dMinReduce avgt 25 15766.749 ± 3.724 ns/op
> FpMinMaxIntrinsics.fMax avgt 25 30140.092 ± 10.243 ns/op
> FpMinMaxIntrinsics.fMin avgt 25 30149.470 ± 34.041 ns/op
> FpMinMaxIntrinsics.fMinReduce avgt 25 15760.770 ± 5.415 ns/op
> MaxMinOptimizeTest.dAdd avgt 25 1155.234 ± 4.603 us/op
> MaxMinOptimizeTest.dMax avgt 25 2597.897 ± 3.307 us/op
> MaxMinOptimizeTest.dMin avgt 25 2599.183 ± 3.806 us/op
> MaxMinOptimizeTest.dMul avgt 25 1155.281 ± 1.813 us/op
> MaxMinOptimizeTest.fAdd avgt 25 750.967 ± 7.254 us/op
> MaxMinOptimizeTest.fMax avgt 25 2305.085 ± 1.556 us/op
> MaxMinOptimizeTest.fMin avgt 25 2305.306 ± 1.478 us/op
> MaxMinOptimizeTest.fMul avgt 25 750.623 ± 7.357 us/op
>
> 2fclass_new
>
> Benchmark Mode Cnt Score Error Units
> FpMinMaxIntrinsics.dMax avgt 25 23599.547 ± 29.571 ns/op
> FpMinMaxIntrinsics.dMin avgt 25 23593.236 ± 18.456 ns/op
> FpMinMaxIntrinsics.dMinReduce avgt 25 8630.201 ± 1.353 ns/op
> FpMinMaxIntrinsics.fMax avgt 25 23496.337 ± 18.340 ns/op
> FpMinMaxIntrinsics.fMin avgt 25 23477.881 ± 8.545 ns/op
> FpMinMaxIntrinsics.fMinReduce avgt 25 8629.135 ± 0.869 ns/op
> MaxMinOptimizeTest.dAdd avgt 25 1155.479 ± 4.938 us/op
> MaxMinOptimizeTest.dMax avgt 25 1560.323 ± 3.077 us/op
> MaxMinOptimizeTest.dMin avgt 25 1558.668 ± 2.421 us/op
> MaxMinOptimizeTest.dMul avgt 25 1154.919 ± 2.077 us/op
> MaxMinOptimizeTest.fAdd avgt 25 751.325 ± 7.169 us/op
> MaxMinOptimizeTest.fMax avgt 25 1306.131 ± 1.102 us/op
> MaxMinOptimizeTest.fMin avgt 25 1306.134 ± 0.957 us/op
> MaxMinOptimizeTest.fMul avgt 25 750.968 ± 7.334 us/op
hotspot:tier3 is fine, so
-------------
PR: https://git.openjdk.org/jdk/pull/11327
More information about the hotspot-compiler-dev
mailing list