RFR: 8329077: C2 SuperWord: Add MoveD2L, MoveL2D, MoveF2I, MoveI2F [v3]

Wed Aug 20 06:55:44 UTC 2025

On Tue, 5 Aug 2025 11:39:43 GMT, Galder Zamarreño <galder at openjdk.org> wrote:

>> I've added support to vectorize `MoveD2L`, `MoveL2D`, `MoveF2I` and `MoveI2F` nodes. The implementation follows a similar pattern to what is done with conversion (`Conv*`) nodes. The tests in `TestCompatibleUseDefTypeSize` have been updated with the new expectations.
>> 
>> Also added a JMH benchmark which measures throughput (the higher the number the better) for methods that exercise these nodes. On darwin/aarch64 it shows:
>> 
>> 
>> Benchmark                                (seed)  (size)   Mode  Cnt      Base      Patch   Units   Diff
>> VectorBitConversion.doubleToLongBits          0    2048  thrpt    8  1168.782   1157.717  ops/ms    -1%
>> VectorBitConversion.doubleToRawLongBits       0    2048  thrpt    8  3999.387   7353.936  ops/ms   +83%
>> VectorBitConversion.floatToIntBits            0    2048  thrpt    8  1200.338   1188.206  ops/ms    -1%
>> VectorBitConversion.floatToRawIntBits         0    2048  thrpt    8  4058.248  14792.474  ops/ms  +264%
>> VectorBitConversion.intBitsToFloat            0    2048  thrpt    8  3050.313  14984.246  ops/ms  +391%
>> VectorBitConversion.longBitsToDouble          0    2048  thrpt    8  3022.691   7379.360  ops/ms  +144%
>> 
>> 
>> The improvements observed are a result of vectorization. The lack of vectorization in `doubleToLongBits` and `floatToIntBits` demonstrates that these changes do not affect their performance. These methods do not vectorize because of flow control.
>> 
>> I've run the tier1-3 tests on linux/aarch64 and didn't observe any regressions.
>
> Galder Zamarreño has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Check at the very least that auto vectorization is supported

Had a quick look again and found a few more suggestions in the tests/benchmarks.
But I think the VM changes are solid :)

test/hotspot/jtreg/compiler/loopopts/superword/TestCompatibleUseDefTypeSize.java line 407:

> 405: 
> 406:     @Test
> 407:     @IR(counts = {IRNode.STORE_VECTOR, "> 0"},

Since you are already fixing up some things here, and we want to be really sure that the vectorization generates correct results, can you please do the following:
- Create IR rule counts for not just the store, but also load and the MoveX2Y. For negative rules it is ok to only check for store, but for positive rules we should try to list all vectors we expect.
- Replace the `Random` usage with `Generators`. This ensures we cover NaN's and other special values more often.

test/micro/org/openjdk/bench/vm/compiler/VectorBitConversion.java line 90:

> 88: 
> 89:     @Benchmark
> 90:     public long[] doubleToLongBits() {

I wonder if we should not just extend this benchmark, that has `convertI2F` etc:
`test/micro/org/openjdk/bench/vm/compiler/TypeVectorOperations.java`

Just a suggestion, we can also keep them separately. Maybe one day we should clean up the benchmarks, and put them all in some `autovectorization` subdirectory, and organize the files and benchmarks a little better.

-------------

Changes requested by epeter (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/26457#pullrequestreview-3135001659
PR Review Comment: https://git.openjdk.org/jdk/pull/26457#discussion_r2287146098
PR Review Comment: https://git.openjdk.org/jdk/pull/26457#discussion_r2287159346