RFR: 8297359: RISC-V: improve performance of floating Max Min intrinsics
Vladimir Kempik
vkempik at openjdk.org
Tue Nov 22 08:31:24 UTC 2022
On Mon, 21 Nov 2022 20:48:00 GMT, Vladimir Kempik <vkempik at openjdk.org> wrote:
> Please review this change.
>
> It improves performance of Math.min/max intrinsics for Floats and Doubles.
>
> The main issue in these intrinsics is the requirement to return NaN if any of arguments is NaN. In risc-v, fmin/fmax returns NaN only if both of src registers are NaN ( quiet NaN).
> That requires additional logic to handle the case where only of of src is NaN.
>
> Here the postcheck with flt (floating less than comparision) and flags analysis replaced with precheck. The precheck is done with fadd-ing srcs into dst and checking the dst for NaN ( with fclass).
>
> The results on the thead c910:
>
> The results, thead c910:
>
> before
>
> Benchmark Mode Cnt Score Error Units
> FpMinMaxIntrinsics.dMax avgt 25 54023.827 ± 268.645 ns/op
> FpMinMaxIntrinsics.dMin avgt 25 54309.850 ± 323.551 ns/op
> FpMinMaxIntrinsics.dMinReduce avgt 25 42192.140 ± 12.114 ns/op
> FpMinMaxIntrinsics.fMax avgt 25 53797.657 ± 15.816 ns/op
> FpMinMaxIntrinsics.fMin avgt 25 54135.710 ± 313.185 ns/op
> FpMinMaxIntrinsics.fMinReduce avgt 25 42196.156 ± 13.424 ns/op
> MaxMinOptimizeTest.dAdd avgt 25 650.810 ± 169.998 us/op
> MaxMinOptimizeTest.dMax avgt 25 4561.967 ± 40.367 us/op
> MaxMinOptimizeTest.dMin avgt 25 4589.100 ± 75.854 us/op
> MaxMinOptimizeTest.dMul avgt 25 759.821 ± 240.092 us/op
> MaxMinOptimizeTest.fAdd avgt 25 300.137 ± 13.495 us/op
> MaxMinOptimizeTest.fMax avgt 25 4348.885 ± 20.061 us/op
> MaxMinOptimizeTest.fMin avgt 25 4372.799 ± 27.296 us/op
> MaxMinOptimizeTest.fMul avgt 25 304.024 ± 12.120 us/op
>
> after
>
> Benchmark Mode Cnt Score Error Units
> FpMinMaxIntrinsics.dMax avgt 25 10545.196 ± 140.137 ns/op
> FpMinMaxIntrinsics.dMin avgt 25 10454.525 ± 9.972 ns/op
> FpMinMaxIntrinsics.dMinReduce avgt 25 3104.703 ± 0.892 ns/op
> FpMinMaxIntrinsics.fMax avgt 25 10449.709 ± 7.284 ns/op
> FpMinMaxIntrinsics.fMin avgt 25 10445.261 ± 7.206 ns/op
> FpMinMaxIntrinsics.fMinReduce avgt 25 3104.769 ± 0.951 ns/op
> MaxMinOptimizeTest.dAdd avgt 25 487.769 ± 170.711 us/op
> MaxMinOptimizeTest.dMax avgt 25 929.394 ± 158.697 us/op
> MaxMinOptimizeTest.dMin avgt 25 864.230 ± 284.794 us/op
> MaxMinOptimizeTest.dMul avgt 25 894.116 ± 342.550 us/op
> MaxMinOptimizeTest.fAdd avgt 25 284.664 ± 1.446 us/op
> MaxMinOptimizeTest.fMax avgt 25 384.388 ± 15.004 us/op
> MaxMinOptimizeTest.fMin avgt 25 371.952 ± 15.295 us/op
> MaxMinOptimizeTest.fMul avgt 25 305.226 ± 12.467 us/op
>
> significant improvement
>
> On hifive u74 ( unmatched) the improvements is less significant:
>
> hifive:
>
> before
> Benchmark Mode Cnt Score Error Units
> FpMinMaxIntrinsics.dMax avgt 25 30219.666 ± 12.878 ns/op
> FpMinMaxIntrinsics.dMin avgt 25 30242.249 ± 31.374 ns/op
> FpMinMaxIntrinsics.dMinReduce avgt 25 15394.622 ± 2.803 ns/op
> FpMinMaxIntrinsics.fMax avgt 25 30150.114 ± 22.421 ns/op
> FpMinMaxIntrinsics.fMin avgt 25 30149.752 ± 20.813 ns/op
> FpMinMaxIntrinsics.fMinReduce avgt 25 15396.402 ± 4.251 ns/op
> MaxMinOptimizeTest.dAdd avgt 25 1143.582 ± 4.444 us/op
> MaxMinOptimizeTest.dMax avgt 25 2556.317 ± 3.795 us/op
> MaxMinOptimizeTest.dMin avgt 25 2556.569 ± 2.274 us/op
> MaxMinOptimizeTest.dMul avgt 25 1142.769 ± 1.593 us/op
> MaxMinOptimizeTest.fAdd avgt 25 748.688 ± 7.342 us/op
> MaxMinOptimizeTest.fMax avgt 25 2280.381 ± 1.535 us/op
> MaxMinOptimizeTest.fMin avgt 25 2280.760 ± 1.532 us/op
> MaxMinOptimizeTest.fMul avgt 25 748.991 ± 7.261 us/op
>
> after:
>
> Benchmark Mode Cnt Score Error Units
> FpMinMaxIntrinsics.dMax avgt 25 27723.791 ± 22.784 ns/op
> FpMinMaxIntrinsics.dMin avgt 25 27760.799 ± 45.411 ns/op
> FpMinMaxIntrinsics.dMinReduce avgt 25 12875.949 ± 2.829 ns/op
> FpMinMaxIntrinsics.fMax avgt 25 25992.753 ± 23.788 ns/op
> FpMinMaxIntrinsics.fMin avgt 25 25994.554 ± 32.060 ns/op
> FpMinMaxIntrinsics.fMinReduce avgt 25 11200.737 ± 2.169 ns/op
> MaxMinOptimizeTest.dAdd avgt 25 1144.128 ± 4.371 us/op
> MaxMinOptimizeTest.dMax avgt 25 1968.145 ± 2.346 us/op
> MaxMinOptimizeTest.dMin avgt 25 1970.249 ± 4.712 us/op
> MaxMinOptimizeTest.dMul avgt 25 1143.356 ± 2.203 us/op
> MaxMinOptimizeTest.fAdd avgt 25 748.634 ± 7.229 us/op
> MaxMinOptimizeTest.fMax avgt 25 1523.719 ± 0.570 us/op
> MaxMinOptimizeTest.fMin avgt 25 1524.534 ± 1.109 us/op
> MaxMinOptimizeTest.fMul avgt 25 748.643 ± 7.291 us/op
>
>
> fAdd/dAdd and fMul/dMull is unaffected likely due to :
>
> private double dAddBench(double a, double b) {
> return Math.max(a, b) + Math.min(a, b);
> }
>
> private double dMulBench(double a, double b) {
> return Math.max(a, b) * Math.min(a, b);
> }
> may get reduces to just a + b and a*b respectively without actually using min/max
>
> Testing : tier1/tier2 in progress, will update this as soon as it finishes
Withdrawn, this version have issues when operating with infinity, I'll redo the change
test FloatMaxVectorTests.MAXReduceFloatMaxVectorTests(float[i * 5]): success
test FloatMaxVectorTests.MAXReduceFloatMaxVectorTests(float[i + 1]): success
test FloatMaxVectorTests.MAXReduceFloatMaxVectorTests(float[cornerCaseValue(i)]): failure
java.lang.AssertionError: at index #2 expected [Infinity] but found [NaN]
at org.testng.Assert.fail(Assert.java:99)
--
test FloatMaxVectorTests.MAXReduceFloatMaxVectorTestsMasked(float[i * 5], mask[i % 2]): success
test FloatMaxVectorTests.MAXReduceFloatMaxVectorTestsMasked(float[i + 1], mask[i % 2]): success
test FloatMaxVectorTests.MAXReduceFloatMaxVectorTestsMasked(float[cornerCaseValue(i)], mask[i % 2]): failure
java.lang.AssertionError: at index #10 expected [Infinity] but found [NaN]
at org.testng.Assert.fail(Assert.java:99)
--
test FloatMaxVectorTests.MAXReduceFloatMaxVectorTestsMasked(float[i * 5], mask[true]): success
test FloatMaxVectorTests.MAXReduceFloatMaxVectorTestsMasked(float[i + 1], mask[true]): success
test FloatMaxVectorTests.MAXReduceFloatMaxVectorTestsMasked(float[cornerCaseValue(i)], mask[true]): failure
java.lang.AssertionError: at index #2 expected [Infinity] but found [NaN]
at org.testng.Assert.fail(Assert.java:99)
--
test FloatMaxVectorTests.MINReduceFloatMaxVectorTests(float[i * 5]): success
test FloatMaxVectorTests.MINReduceFloatMaxVectorTests(float[i + 1]): success
test FloatMaxVectorTests.MINReduceFloatMaxVectorTests(float[cornerCaseValue(i)]): failure
java.lang.AssertionError: at index #2 expected [-Infinity] but found [NaN]
at org.testng.Assert.fail(Assert.java:99)
--
test FloatMaxVectorTests.MINReduceFloatMaxVectorTestsMasked(float[i * 5], mask[i % 2]): success
test FloatMaxVectorTests.MINReduceFloatMaxVectorTestsMasked(float[i + 1], mask[i % 2]): success
test FloatMaxVectorTests.MINReduceFloatMaxVectorTestsMasked(float[cornerCaseValue(i)], mask[i % 2]): failure
java.lang.AssertionError: at index #2 expected [-Infinity] but found [NaN]
at org.testng.Assert.fail(Assert.java:99)
--
test FloatMaxVectorTests.MINReduceFloatMaxVectorTestsMasked(float[i * 5], mask[true]): success
test FloatMaxVectorTests.MINReduceFloatMaxVectorTestsMasked(float[i + 1], mask[true]): success
test FloatMaxVectorTests.MINReduceFloatMaxVectorTestsMasked(float[cornerCaseValue(i)], mask[true]): failure
java.lang.AssertionError: at index #2 expected [-Infinity] but found [NaN]
at org.testng.Assert.fail(Assert.java:99)
-------------
PR: https://git.openjdk.org/jdk/pull/11276
More information about the hotspot-compiler-dev
mailing list