RFR: 8297359: RISC-V: improve performance of floating Max Min intrinsics

Tue Nov 22 08:31:24 UTC 2022

On Mon, 21 Nov 2022 20:48:00 GMT, Vladimir Kempik <vkempik at openjdk.org> wrote:

> Please review this change.
> 
> It improves performance of Math.min/max intrinsics for Floats and Doubles.
> 
> The main issue in these intrinsics is the requirement to return NaN if any of arguments is NaN. In risc-v, fmin/fmax returns NaN only if both of src registers are NaN ( quiet NaN).
> That requires additional logic to handle the case where only of of src is NaN.
> 
> Here the postcheck with flt (floating less than comparision) and flags analysis replaced with precheck. The precheck is done with fadd-ing srcs into dst and checking the dst for NaN ( with fclass).
> 
> The results on the thead c910:
> 
> The results, thead c910:
> 
> before
> 
> Benchmark                      Mode  Cnt      Score     Error  Units
> FpMinMaxIntrinsics.dMax        avgt   25  54023.827 ± 268.645  ns/op
> FpMinMaxIntrinsics.dMin        avgt   25  54309.850 ± 323.551  ns/op
> FpMinMaxIntrinsics.dMinReduce  avgt   25  42192.140 ±  12.114  ns/op
> FpMinMaxIntrinsics.fMax        avgt   25  53797.657 ±  15.816  ns/op
> FpMinMaxIntrinsics.fMin        avgt   25  54135.710 ± 313.185  ns/op
> FpMinMaxIntrinsics.fMinReduce  avgt   25  42196.156 ±  13.424  ns/op
> MaxMinOptimizeTest.dAdd        avgt   25    650.810 ± 169.998  us/op
> MaxMinOptimizeTest.dMax        avgt   25   4561.967 ±  40.367  us/op
> MaxMinOptimizeTest.dMin        avgt   25   4589.100 ±  75.854  us/op
> MaxMinOptimizeTest.dMul        avgt   25    759.821 ± 240.092  us/op
> MaxMinOptimizeTest.fAdd        avgt   25    300.137 ±  13.495  us/op
> MaxMinOptimizeTest.fMax        avgt   25   4348.885 ±  20.061  us/op
> MaxMinOptimizeTest.fMin        avgt   25   4372.799 ±  27.296  us/op
> MaxMinOptimizeTest.fMul        avgt   25    304.024 ±  12.120  us/op
> 
> after
> 
> Benchmark                      Mode  Cnt      Score     Error  Units
> FpMinMaxIntrinsics.dMax        avgt   25  10545.196 ± 140.137  ns/op
> FpMinMaxIntrinsics.dMin        avgt   25  10454.525 ±   9.972  ns/op
> FpMinMaxIntrinsics.dMinReduce  avgt   25   3104.703 ±   0.892  ns/op
> FpMinMaxIntrinsics.fMax        avgt   25  10449.709 ±   7.284  ns/op
> FpMinMaxIntrinsics.fMin        avgt   25  10445.261 ±   7.206  ns/op
> FpMinMaxIntrinsics.fMinReduce  avgt   25   3104.769 ±   0.951  ns/op
> MaxMinOptimizeTest.dAdd        avgt   25    487.769 ± 170.711  us/op
> MaxMinOptimizeTest.dMax        avgt   25    929.394 ± 158.697  us/op
> MaxMinOptimizeTest.dMin        avgt   25    864.230 ± 284.794  us/op
> MaxMinOptimizeTest.dMul        avgt   25    894.116 ± 342.550  us/op
> MaxMinOptimizeTest.fAdd        avgt   25    284.664 ±   1.446  us/op
> MaxMinOptimizeTest.fMax        avgt   25    384.388 ±  15.004  us/op
> MaxMinOptimizeTest.fMin        avgt   25    371.952 ±  15.295  us/op
> MaxMinOptimizeTest.fMul        avgt   25    305.226 ±  12.467  us/op
> 
> significant improvement
> 
> On hifive u74 ( unmatched) the improvements is less significant:
> 
> hifive:
> 
> before
> Benchmark                      Mode  Cnt      Score    Error  Units
> FpMinMaxIntrinsics.dMax        avgt   25  30219.666 ± 12.878  ns/op
> FpMinMaxIntrinsics.dMin        avgt   25  30242.249 ± 31.374  ns/op
> FpMinMaxIntrinsics.dMinReduce  avgt   25  15394.622 ±  2.803  ns/op
> FpMinMaxIntrinsics.fMax        avgt   25  30150.114 ± 22.421  ns/op
> FpMinMaxIntrinsics.fMin        avgt   25  30149.752 ± 20.813  ns/op
> FpMinMaxIntrinsics.fMinReduce  avgt   25  15396.402 ±  4.251  ns/op
> MaxMinOptimizeTest.dAdd        avgt   25   1143.582 ±  4.444  us/op
> MaxMinOptimizeTest.dMax        avgt   25   2556.317 ±  3.795  us/op
> MaxMinOptimizeTest.dMin        avgt   25   2556.569 ±  2.274  us/op
> MaxMinOptimizeTest.dMul        avgt   25   1142.769 ±  1.593  us/op
> MaxMinOptimizeTest.fAdd        avgt   25    748.688 ±  7.342  us/op
> MaxMinOptimizeTest.fMax        avgt   25   2280.381 ±  1.535  us/op
> MaxMinOptimizeTest.fMin        avgt   25   2280.760 ±  1.532  us/op
> MaxMinOptimizeTest.fMul        avgt   25    748.991 ±  7.261  us/op
> 
> after:
> 
> Benchmark                      Mode  Cnt      Score    Error  Units
> FpMinMaxIntrinsics.dMax        avgt   25  27723.791 ± 22.784  ns/op
> FpMinMaxIntrinsics.dMin        avgt   25  27760.799 ± 45.411  ns/op
> FpMinMaxIntrinsics.dMinReduce  avgt   25  12875.949 ±  2.829  ns/op
> FpMinMaxIntrinsics.fMax        avgt   25  25992.753 ± 23.788  ns/op
> FpMinMaxIntrinsics.fMin        avgt   25  25994.554 ± 32.060  ns/op
> FpMinMaxIntrinsics.fMinReduce  avgt   25  11200.737 ±  2.169  ns/op
> MaxMinOptimizeTest.dAdd        avgt   25   1144.128 ±  4.371  us/op
> MaxMinOptimizeTest.dMax        avgt   25   1968.145 ±  2.346  us/op
> MaxMinOptimizeTest.dMin        avgt   25   1970.249 ±  4.712  us/op
> MaxMinOptimizeTest.dMul        avgt   25   1143.356 ±  2.203  us/op
> MaxMinOptimizeTest.fAdd        avgt   25    748.634 ±  7.229  us/op
> MaxMinOptimizeTest.fMax        avgt   25   1523.719 ±  0.570  us/op
> MaxMinOptimizeTest.fMin        avgt   25   1524.534 ±  1.109  us/op
> MaxMinOptimizeTest.fMul        avgt   25    748.643 ±  7.291  us/op
> 
> 
> fAdd/dAdd and fMul/dMull is unaffected likely due to :
> 
> private double dAddBench(double a, double b) {
>     return Math.max(a, b) + Math.min(a, b);
> }
> 
> private double dMulBench(double a, double b) {
>     return Math.max(a, b) * Math.min(a, b);
> }
> may get reduces to just a + b and a*b respectively without actually using min/max
> 
> Testing : tier1/tier2 in progress, will update this as soon as it finishes

Withdrawn, this version have issues when operating with infinity, I'll redo the change

test FloatMaxVectorTests.MAXReduceFloatMaxVectorTests(float[i * 5]): success
test FloatMaxVectorTests.MAXReduceFloatMaxVectorTests(float[i + 1]): success
test FloatMaxVectorTests.MAXReduceFloatMaxVectorTests(float[cornerCaseValue(i)]): failure
java.lang.AssertionError: at index #2 expected [Infinity] but found [NaN]
        at org.testng.Assert.fail(Assert.java:99)
--
test FloatMaxVectorTests.MAXReduceFloatMaxVectorTestsMasked(float[i * 5], mask[i % 2]): success
test FloatMaxVectorTests.MAXReduceFloatMaxVectorTestsMasked(float[i + 1], mask[i % 2]): success
test FloatMaxVectorTests.MAXReduceFloatMaxVectorTestsMasked(float[cornerCaseValue(i)], mask[i % 2]): failure
java.lang.AssertionError: at index #10 expected [Infinity] but found [NaN]
        at org.testng.Assert.fail(Assert.java:99)
--
test FloatMaxVectorTests.MAXReduceFloatMaxVectorTestsMasked(float[i * 5], mask[true]): success
test FloatMaxVectorTests.MAXReduceFloatMaxVectorTestsMasked(float[i + 1], mask[true]): success
test FloatMaxVectorTests.MAXReduceFloatMaxVectorTestsMasked(float[cornerCaseValue(i)], mask[true]): failure
java.lang.AssertionError: at index #2 expected [Infinity] but found [NaN]
        at org.testng.Assert.fail(Assert.java:99)
--
test FloatMaxVectorTests.MINReduceFloatMaxVectorTests(float[i * 5]): success
test FloatMaxVectorTests.MINReduceFloatMaxVectorTests(float[i + 1]): success
test FloatMaxVectorTests.MINReduceFloatMaxVectorTests(float[cornerCaseValue(i)]): failure
java.lang.AssertionError: at index #2 expected [-Infinity] but found [NaN]
        at org.testng.Assert.fail(Assert.java:99)
--
test FloatMaxVectorTests.MINReduceFloatMaxVectorTestsMasked(float[i * 5], mask[i % 2]): success
test FloatMaxVectorTests.MINReduceFloatMaxVectorTestsMasked(float[i + 1], mask[i % 2]): success
test FloatMaxVectorTests.MINReduceFloatMaxVectorTestsMasked(float[cornerCaseValue(i)], mask[i % 2]): failure
java.lang.AssertionError: at index #2 expected [-Infinity] but found [NaN]
        at org.testng.Assert.fail(Assert.java:99)
--
test FloatMaxVectorTests.MINReduceFloatMaxVectorTestsMasked(float[i * 5], mask[true]): success
test FloatMaxVectorTests.MINReduceFloatMaxVectorTestsMasked(float[i + 1], mask[true]): success
test FloatMaxVectorTests.MINReduceFloatMaxVectorTestsMasked(float[cornerCaseValue(i)], mask[true]): failure
java.lang.AssertionError: at index #2 expected [-Infinity] but found [NaN]
        at org.testng.Assert.fail(Assert.java:99)

-------------

PR: https://git.openjdk.org/jdk/pull/11276