RFR: 8342393: Promote commutative vector IR node sharing [v17]

Emanuel Peter epeter at openjdk.org
Mon Jan 27 07:31:51 UTC 2025


On Thu, 23 Jan 2025 18:45:16 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:

>> Patch promotes the sharing of commutative vector IR with the same inputs but different input ordering.
>> Similar to scalar IR where we perform edge swapping by [sorting inputs](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/addnode.cpp#L122) based on node indices during IR idealization.
>> 
>> Following are the performance stats for JMH micro included with the patch.
>> 
>> 
>> Granite Rapids (P-core Xeon Server)
>> Baseline : 
>> Benchmark                                                                (size)   Mode  Cnt      Score   Error   Units
>> VectorCommutativeOperSharingBenchmark.commutativeByteOperationShairing     1024  thrpt    2   8982.549          ops/ms
>> VectorCommutativeOperSharingBenchmark.commutativeIntOperationShairing      1024  thrpt    2   6072.773          ops/ms
>> VectorCommutativeOperSharingBenchmark.commutativeLongOperationShairing     1024  thrpt    2   2368.856          ops/ms
>> VectorCommutativeOperSharingBenchmark.commutativeShortOperationShairing    1024  thrpt    2  15215.087          ops/ms
>> 
>> Withopt:
>> Benchmark                                                                (size)   Mode  Cnt      Score   Error   Units
>> VectorCommutativeOperSharingBenchmark.commutativeByteOperationShairing     1024  thrpt    2  11963.554          ops/ms
>> VectorCommutativeOperSharingBenchmark.commutativeIntOperationShairing      1024  thrpt    2   7036.088          ops/ms
>> VectorCommutativeOperSharingBenchmark.commutativeLongOperationShairing     1024  thrpt    2   2906.731          ops/ms
>> VectorCommutativeOperSharingBenchmark.commutativeShortOperationShairing    1024  thrpt    2  17148.131          ops/ms
>> 
>> Sierra Forest (E-core Xeon Server)
>> Baseline:
>> Benchmark                                                                (size)   Mode  Cnt     Score   Error   Units
>> VectorCommutativeOperSharingBenchmark.commutativeByteOperationShairing     1024  thrpt    2  2444.359          ops/ms
>> VectorCommutativeOperSharingBenchmark.commutativeIntOperationShairing      1024  thrpt    2  1710.256          ops/ms
>> VectorCommutativeOperSharingBenchmark.commutativeLongOperationShairing     1024  thrpt    2   308.766          ops/ms
>> VectorCommutativeOperSharingBenchmark.commutativeShortOperationShairing    1024  thrpt    2  3902.179          ops/ms
>> 
>> Withopt:
>> Benchmark                                                                (size)   Mode  Cnt     Score   Error   Units
>> VectorCommutativeOperSharingBenchmark.com...
>
> Jatin Bhateja has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision:
> 
>   Modifed scheme not based over fragile node level flags base solution.

src/hotspot/share/opto/vectornode.cpp line 1056:

> 1054:   //   i.e. RES = VEC1.lanewise(OPER, VEC2, MASK) is semantically equivalent to
> 1055:   //        RES = VEC1.BLEND(VEC1.lanewise(OPER, VEC2), MASK)
> 1056:   // Here, first operand "VEC1" represents both source and destination vector.

Sorry @jatin-bhateja . But this still does not make sense to me. I thought the `RES` was the destination vector for the `BLEND`? Does `BLEND` not have 3 inputs (2 vectors + 1 mask vector), and one output vector?

So I would expect a `BLEND` to have a form like this:
`RES = BLEND(V1, V2, M)`

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/22863#discussion_r1930078016


More information about the hotspot-compiler-dev mailing list