RFR: 8279508: Auto-vectorize Math.round API [v2]
Vladimir Ivanov
vlivanov at openjdk.java.net
Wed Jan 19 21:37:50 UTC 2022
On Wed, 19 Jan 2022 17:38:25 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:
>> Summary of changes:
>> - Intrinsify Math.round(float) and Math.round(double) APIs.
>> - Extend auto-vectorizer to infer vector operations on encountering scalar IR nodes for above intrinsics.
>> - Test creation using new IR testing framework.
>>
>> Following are the performance number of a JMH micro included with the patch
>>
>> Test System: Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz (Icelake Server)
>>
>> | | BASELINE AVX2 | WithOpt AVX2 | Gain (opt/baseline) | Baseline AVX3 | Withopt AVX3 | Gain (opt/baseline)
>> -- | -- | -- | -- | -- | -- | -- | --
>> Benchmark | ARRAYLEN | Score (ops/ms) | Score (ops/ms) | | Score (ops/ms) | Score (ops/ms) |
>> FpRoundingBenchmark.test_round_double | 1024 | 518.532 | 1364.066 | 2.630630318 | 512.908 | 4292.11 | 8.368186887
>> FpRoundingBenchmark.test_round_double | 2048 | 270.137 | 830.986 | 3.076165057 | 273.159 | 2459.116 | 9.002507697
>> FpRoundingBenchmark.test_round_float | 1024 | 752.436 | 7780.905 | 10.34095259 | 752.49 | 9506.694 | 12.63364829
>> FpRoundingBenchmark.test_round_float | 2048 | 389.499 | 4113.046 | 10.55983712 | 389.63 | 4863.673 | 12.48279907
>>
>> Kindly review and share your feedback.
>>
>> Best Regards,
>> Jatin
>
> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision:
>
> 8279508: Adding a test for scalar intrinsification.
There are already `RoundFloat`, `RoundDouble`, and `RoundDoubleMode` nodes defined.
Though `RoundFloat` and `RoundDouble` are legacy nodes used only on x86-32, `RoundDoubleMode` supports multiple rounding modes and is amenable to auto-vectorization.
What do you think about the following alternative?
Reuse `RoundDoubleMode` (with a new rounding mode) and introduce `RoundFloatMode`.
Special rounding rules is not the only peculiarity of `Math.round()`. It also converts the result to an integral type. It can be represented as `ConvF2I (RoundFloatMode f #rmode)` / `ConvD2L (RoundDoubleMode d #rmode)`. In scalar case, it can be matched as a single AD instruction.
Auto-vectorizer can then convert it to `VectorCastF2X (RoundFloatModeV vf #rmode)` / `VectorCastD2X (RoundDoubleModeV vd #rmode)` and match it in a similar manner.
test/hotspot/jtreg/compiler/c2/cr6340864/TestFloatVect.java line 33:
> 31: * @run main/othervm -Xbatch -XX:CompileCommand=exclude,*::test() -Xmx128m -XX:MaxVectorSize=16 compiler.c2.cr6340864.TestFloatVect
> 32: * @run main/othervm -Xbatch -XX:CompileCommand=exclude,*::test() -Xmx128m -XX:MaxVectorSize=32 compiler.c2.cr6340864.TestFloatVect
> 33: * @run main/othervm -Xbatch -XX:CompileCommand=exclude,*::test() -XX:TieredStopAtLevel=2 -Xmx128m -XX:MaxVectorSize=32 compiler.c2.cr6340864.TestFloatVect
What's the purpose of `-XX:TieredStopAtLevel=2` from testing perspective?
-------------
PR: https://git.openjdk.java.net/jdk/pull/7094
More information about the core-libs-dev
mailing list