RFR: 8279508: Auto-vectorize Math.round API [v2]

Wed Jan 19 21:37:50 UTC 2022

On Wed, 19 Jan 2022 17:38:25 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:

>> Summary of changes:
>> - Intrinsify Math.round(float) and Math.round(double) APIs.
>> - Extend auto-vectorizer to infer vector operations on encountering scalar IR nodes for above intrinsics.
>> - Test creation using new IR testing framework.
>> 
>> Following are the performance number of a JMH micro included with the patch 
>> 
>> Test System: Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz (Icelake Server)
>> 
>>   |   | BASELINE AVX2 | WithOpt AVX2 | Gain (opt/baseline) | Baseline AVX3 | Withopt AVX3 | Gain (opt/baseline)
>> -- | -- | -- | -- | -- | -- | -- | --
>> Benchmark | ARRAYLEN | Score (ops/ms) | Score (ops/ms) |   | Score (ops/ms) | Score (ops/ms) |  
>> FpRoundingBenchmark.test_round_double | 1024 | 518.532 | 1364.066 | 2.630630318 | 512.908 | 4292.11 | 8.368186887
>> FpRoundingBenchmark.test_round_double | 2048 | 270.137 | 830.986 | 3.076165057 | 273.159 | 2459.116 | 9.002507697
>> FpRoundingBenchmark.test_round_float | 1024 | 752.436 | 7780.905 | 10.34095259 | 752.49 | 9506.694 | 12.63364829
>> FpRoundingBenchmark.test_round_float | 2048 | 389.499 | 4113.046 | 10.55983712 | 389.63 | 4863.673 | 12.48279907
>> 
>> Kindly review and share your feedback.
>> 
>> Best Regards,
>> Jatin
>
> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision:
> 
>   8279508: Adding a test for scalar intrinsification.

There are already `RoundFloat`, `RoundDouble`, and `RoundDoubleMode` nodes defined.

Though `RoundFloat` and `RoundDouble` are legacy nodes used only on x86-32, `RoundDoubleMode` supports multiple rounding modes and is amenable to auto-vectorization.

What do you think about the following alternative? 

Reuse `RoundDoubleMode` (with a new rounding mode) and introduce `RoundFloatMode`.

Special rounding rules is not the only peculiarity of `Math.round()`. It also converts the result to an integral type. It can be represented as `ConvF2I (RoundFloatMode f #rmode)` / `ConvD2L (RoundDoubleMode d #rmode)`. In scalar case, it can be matched as a single AD instruction.

Auto-vectorizer can then convert it to `VectorCastF2X (RoundFloatModeV vf #rmode)` / `VectorCastD2X (RoundDoubleModeV vd #rmode)` and match it in a similar manner.

test/hotspot/jtreg/compiler/c2/cr6340864/TestFloatVect.java line 33:

> 31:  * @run main/othervm -Xbatch -XX:CompileCommand=exclude,*::test() -Xmx128m -XX:MaxVectorSize=16 compiler.c2.cr6340864.TestFloatVect
> 32:  * @run main/othervm -Xbatch -XX:CompileCommand=exclude,*::test() -Xmx128m -XX:MaxVectorSize=32 compiler.c2.cr6340864.TestFloatVect
> 33:  * @run main/othervm -Xbatch -XX:CompileCommand=exclude,*::test() -XX:TieredStopAtLevel=2 -Xmx128m -XX:MaxVectorSize=32 compiler.c2.cr6340864.TestFloatVect

What's the purpose of `-XX:TieredStopAtLevel=2` from testing perspective?

-------------

PR: https://git.openjdk.java.net/jdk/pull/7094