Pre-Review: improving Math.min/max on floats
Vladimir Kempik
vladimir.kempik at gmail.com
Wed Nov 23 09:10:37 UTC 2022
Hello
Got a results for new [1] version
it shows excelent perf improvements on thead and moderate on hifive ( and it’s better than both previous versions on hifive)
thead c910
before
Benchmark Mode Cnt Score Error Units
FpMinMaxIntrinsics.dMax avgt 25 53752.831 ± 97.198 ns/op
FpMinMaxIntrinsics.dMin avgt 25 53707.229 ± 177.559 ns/op
FpMinMaxIntrinsics.dMinReduce avgt 25 42805.985 ± 9.901 ns/op
FpMinMaxIntrinsics.fMax avgt 25 53449.568 ± 215.294 ns/op
FpMinMaxIntrinsics.fMin avgt 25 53504.106 ± 180.833 ns/op
FpMinMaxIntrinsics.fMinReduce avgt 25 42794.579 ± 7.013 ns/op
MaxMinOptimizeTest.dAdd avgt 25 381.138 ± 5.692 us/op
MaxMinOptimizeTest.dMax avgt 25 4575.094 ± 17.065 us/op
MaxMinOptimizeTest.dMin avgt 25 4584.648 ± 18.561 us/op
MaxMinOptimizeTest.dMul avgt 25 384.615 ± 7.751 us/op
MaxMinOptimizeTest.fAdd avgt 25 318.076 ± 3.308 us/op
MaxMinOptimizeTest.fMax avgt 25 4405.724 ± 20.353 us/op
MaxMinOptimizeTest.fMin avgt 25 4421.652 ± 18.029 us/op
MaxMinOptimizeTest.fMul avgt 25 305.462 ± 19.437 us/op
2fclass_new
Benchmark Mode Cnt Score Error Units
FpMinMaxIntrinsics.dMax avgt 25 10712.246 ± 5.607 ns/op
FpMinMaxIntrinsics.dMin avgt 25 10732.655 ± 41.894 ns/op
FpMinMaxIntrinsics.dMinReduce avgt 25 3248.106 ± 2.143 ns/op
FpMinMaxIntrinsics.fMax avgt 25 10707.084 ± 3.276 ns/op
FpMinMaxIntrinsics.fMin avgt 25 10719.771 ± 14.864 ns/op
FpMinMaxIntrinsics.fMinReduce avgt 25 3274.775 ± 0.996 ns/op
MaxMinOptimizeTest.dAdd avgt 25 383.720 ± 8.849 us/op
MaxMinOptimizeTest.dMax avgt 25 429.345 ± 11.160 us/op
MaxMinOptimizeTest.dMin avgt 25 439.980 ± 3.757 us/op
MaxMinOptimizeTest.dMul avgt 25 390.126 ± 10.258 us/op
MaxMinOptimizeTest.fAdd avgt 25 300.005 ± 18.206 us/op
MaxMinOptimizeTest.fMax avgt 25 370.467 ± 6.054 us/op
MaxMinOptimizeTest.fMin avgt 25 375.134 ± 4.568 us/op
MaxMinOptimizeTest.fMul avgt 25 305.344 ± 18.307 us/op
hifive
before
Benchmark Mode Cnt Score Error Units
FpMinMaxIntrinsics.dMax avgt 25 30234.224 ± 16.744 ns/op
FpMinMaxIntrinsics.dMin avgt 25 30227.686 ± 15.389 ns/op
FpMinMaxIntrinsics.dMinReduce avgt 25 15766.749 ± 3.724 ns/op
FpMinMaxIntrinsics.fMax avgt 25 30140.092 ± 10.243 ns/op
FpMinMaxIntrinsics.fMin avgt 25 30149.470 ± 34.041 ns/op
FpMinMaxIntrinsics.fMinReduce avgt 25 15760.770 ± 5.415 ns/op
MaxMinOptimizeTest.dAdd avgt 25 1155.234 ± 4.603 us/op
MaxMinOptimizeTest.dMax avgt 25 2597.897 ± 3.307 us/op
MaxMinOptimizeTest.dMin avgt 25 2599.183 ± 3.806 us/op
MaxMinOptimizeTest.dMul avgt 25 1155.281 ± 1.813 us/op
MaxMinOptimizeTest.fAdd avgt 25 750.967 ± 7.254 us/op
MaxMinOptimizeTest.fMax avgt 25 2305.085 ± 1.556 us/op
MaxMinOptimizeTest.fMin avgt 25 2305.306 ± 1.478 us/op
MaxMinOptimizeTest.fMul avgt 25 750.623 ± 7.357 us/op
2fclass_new
Benchmark Mode Cnt Score Error Units
FpMinMaxIntrinsics.dMax avgt 25 23599.547 ± 29.571 ns/op
FpMinMaxIntrinsics.dMin avgt 25 23593.236 ± 18.456 ns/op
FpMinMaxIntrinsics.dMinReduce avgt 25 8630.201 ± 1.353 ns/op
FpMinMaxIntrinsics.fMax avgt 25 23496.337 ± 18.340 ns/op
FpMinMaxIntrinsics.fMin avgt 25 23477.881 ± 8.545 ns/op
FpMinMaxIntrinsics.fMinReduce avgt 25 8629.135 ± 0.869 ns/op
MaxMinOptimizeTest.dAdd avgt 25 1155.479 ± 4.938 us/op
MaxMinOptimizeTest.dMax avgt 25 1560.323 ± 3.077 us/op
MaxMinOptimizeTest.dMin avgt 25 1558.668 ± 2.421 us/op
MaxMinOptimizeTest.dMul avgt 25 1154.919 ± 2.077 us/op
MaxMinOptimizeTest.fAdd avgt 25 751.325 ± 7.169 us/op
MaxMinOptimizeTest.fMax avgt 25 1306.131 ± 1.102 us/op
MaxMinOptimizeTest.fMin avgt 25 1306.134 ± 0.957 us/op
MaxMinOptimizeTest.fMul avgt 25 750.968 ± 7.334 us/op
Regards, Vladimir
[1] https://github.com/VladimirKempik/jdk/commit/fda44a8521f19b25d0fe155531d4bd1e3d7870a5
> 22 нояб. 2022 г., в 12:05, Vladimir Kempik <vladimir.kempik at gmail.com> написал(а):
>
> Hello Fei
>
> I think I can reduce the amount of opcodes for second version, but I need a second temp register for that ( to AND two results of fclass and check it just once for NaN)
> then it would look like:
>
> is_double ? fclass_d(t0, src1)
> : fclass_s(t0, src1);
> is_double ? fclass_d(t1, src2)
> : fclass_s(t1, src2);
> and(t0, t0, t1);
> andi(t0, t0, 0b1100000000); //if any of src is quiet or signaling NaN then return their sum
> beqz(t0, Compare);
> is_double ? fadd_d(dst, src1, src2)
> : fadd_s(dst, src1, src2);
> j(Done);
>
> bind(Compare);
>
> Any Hints on how to get a second temp register ?
>
> Regards, Vladimir
>
>> 22 нояб. 2022 г., в 11:28, Vladimir Kempik <vladimir.kempik at gmail.com> написал(а):
>>
>> Hello
>>
>> Found an issue with fadd+fclass version:
>>
>> jdk/incubator/vector/FloatMaxVectorTests.java
>>
>> test FloatMaxVectorTests.MAXReduceFloatMaxVectorTests(float[i * 5]): success
>> test FloatMaxVectorTests.MAXReduceFloatMaxVectorTests(float[i + 1]): success
>> test FloatMaxVectorTests.MAXReduceFloatMaxVectorTests(float[cornerCaseValue(i)]): failure
>> java.lang.AssertionError: at index #2 expected [Infinity] but found [NaN]
>> at org.testng.Assert.fail(Assert.java:99)
>> --
>> test FloatMaxVectorTests.MAXReduceFloatMaxVectorTestsMasked(float[i * 5], mask[i % 2]): success
>> test FloatMaxVectorTests.MAXReduceFloatMaxVectorTestsMasked(float[i + 1], mask[i % 2]): success
>> test FloatMaxVectorTests.MAXReduceFloatMaxVectorTestsMasked(float[cornerCaseValue(i)], mask[i % 2]): failure
>> java.lang.AssertionError: at index #10 expected [Infinity] but found [NaN]
>> at org.testng.Assert.fail(Assert.java:99)
>> --
>> test FloatMaxVectorTests.MAXReduceFloatMaxVectorTestsMasked(float[i * 5], mask[true]): success
>> test FloatMaxVectorTests.MAXReduceFloatMaxVectorTestsMasked(float[i + 1], mask[true]): success
>> test FloatMaxVectorTests.MAXReduceFloatMaxVectorTestsMasked(float[cornerCaseValue(i)], mask[true]): failure
>> java.lang.AssertionError: at index #2 expected [Infinity] but found [NaN]
>> at org.testng.Assert.fail(Assert.java:99)
>> --
>> test FloatMaxVectorTests.MINReduceFloatMaxVectorTests(float[i * 5]): success
>> test FloatMaxVectorTests.MINReduceFloatMaxVectorTests(float[i + 1]): success
>> test FloatMaxVectorTests.MINReduceFloatMaxVectorTests(float[cornerCaseValue(i)]): failure
>> java.lang.AssertionError: at index #2 expected [-Infinity] but found [NaN]
>> at org.testng.Assert.fail(Assert.java:99)
>> --
>> test FloatMaxVectorTests.MINReduceFloatMaxVectorTestsMasked(float[i * 5], mask[i % 2]): success
>> test FloatMaxVectorTests.MINReduceFloatMaxVectorTestsMasked(float[i + 1], mask[i % 2]): success
>> test FloatMaxVectorTests.MINReduceFloatMaxVectorTestsMasked(float[cornerCaseValue(i)], mask[i % 2]): failure
>> java.lang.AssertionError: at index #2 expected [-Infinity] but found [NaN]
>> at org.testng.Assert.fail(Assert.java:99)
>> --
>> test FloatMaxVectorTests.MINReduceFloatMaxVectorTestsMasked(float[i * 5], mask[true]): success
>> test FloatMaxVectorTests.MINReduceFloatMaxVectorTestsMasked(float[i + 1], mask[true]): success
>> test FloatMaxVectorTests.MINReduceFloatMaxVectorTestsMasked(float[cornerCaseValue(i)], mask[true]): failure
>> java.lang.AssertionError: at index #2 expected [-Infinity] but found [NaN]
>> at org.testng.Assert.fail(Assert.java:99)
>>
More information about the riscv-port-dev
mailing list