RFR: 8345485: C2 MergeLoads: merge adjacent array/native memory loads into larger load [v12]

Emanuel Peter epeter at openjdk.org
Thu Apr 17 09:44:04 UTC 2025


On Mon, 7 Apr 2025 02:24:03 GMT, kuaiwei <duke at openjdk.org> wrote:

>> In this patch, I extent the merge stores optimization to merge adjacents loads. Tier1 tests are passed in my machine.
>> 
>> The benchmark result of MergeLoadBench.java
>> AMD EPYC 9T24 96-Core Processor:
>> 
>> |name   | -MergeLoads   | +MergeLoads   |delta|
>> |---|---|---|---|
>> |MergeLoadBench.getCharB        |4352.150       |4407.435       | 55.29 |
>> |MergeLoadBench.getCharBU       |4075.320       |4084.663       | 9.34 |
>> |MergeLoadBench.getCharBV       |3221.302       |3221.528       | 0.23 |
>> |MergeLoadBench.getCharC        |2235.433       |2238.796       | 3.36 |
>> |MergeLoadBench.getCharL        |4363.244       |4372.281       | 9.04 |
>> |MergeLoadBench.getCharLU       |4072.550       |4075.744       | 3.19 |
>> |MergeLoadBench.getCharLV       |2227.825       |2231.612       | 3.79 |
>> |MergeLoadBench.getIntB |11199.935      |6869.030       | -4330.90 |
>> |MergeLoadBench.getIntBU        |6853.862       |2763.923       | -4089.94 |
>> |MergeLoadBench.getIntBV        |306.953        |309.911        | 2.96 |
>> |MergeLoadBench.getIntL |10426.843      |6523.716       | -3903.13 |
>> |MergeLoadBench.getIntLU        |6740.847       |2602.701       | -4138.15 |
>> |MergeLoadBench.getIntLV        |2233.151       |2231.745       | -1.41 |
>> |MergeLoadBench.getIntRB        |11335.756      |8980.619       | -2355.14 |
>> |MergeLoadBench.getIntRBU       |7439.873       |3190.208       | -4249.66 |
>> |MergeLoadBench.getIntRL        |16323.040      |7786.842       | -8536.20 |
>> |MergeLoadBench.getIntRLU       |7457.745       |3364.140       | -4093.61 |
>> |MergeLoadBench.getIntRU        |2512.621       |2511.668       | -0.95 |
>> |MergeLoadBench.getIntU |2501.064       |2500.629       | -0.43 |
>> |MergeLoadBench.getLongB        |21175.442      |21103.660      | -71.78 |
>> |MergeLoadBench.getLongBU       |14042.046      |2512.784       | -11529.26 |
>> |MergeLoadBench.getLongBV       |606.448        |606.171        | -0.28 |
>> |MergeLoadBench.getLongL        |23142.178      |23217.785      | 75.61 |
>> |MergeLoadBench.getLongLU       |14112.972      |2237.659       | -11875.31 |
>> |MergeLoadBench.getLongLV       |2230.416       |2231.224       | 0.81 |
>> |MergeLoadBench.getLongRB       |21152.558      |21140.583      | -11.98 |
>> |MergeLoadBench.getLongRBU      |14031.178      |2520.317       | -11510.86 |
>> |MergeLoadBench.getLongRL       |23248.506      |23136.410      | -112.10 |
>> |MergeLoadBench.getLongRLU      |14125.032      |2240.481       | -11884.55 |
>> |Merg...
>
> kuaiwei has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 14 commits:
> 
>  - Merge remote-tracking branch 'origin/master' into dev/merge_loads
>  - Remove unused code
>  - Move code to addnode.cpp and add more tests
>  - Merge remote-tracking branch 'origin/master' into dev/merge_loads
>  - Fix test
>  - Add more tests
>  - Enable StressIGVN and riscv platform
>  - Change tests as review comments
>  - Fix test failure and change for review comments
>  - Revert extract value and add more tests
>  - ... and 4 more: https://git.openjdk.org/jdk/compare/660b17a6...f6518b26

First batch, need to change trains, I'll continue later :)

src/hotspot/share/opto/addnode.cpp line 800:

> 798:  *
> 799:  * Because array value is logical and with constant 0xff/0xffff, LoadS/LoadB is converted to an unsigned load
> 800:  * and 'And' node is eliminated in previous IGVN phase. Please check AndINode::Ideal for reference

What do you mean by `array value is logical`?
Do you have tests where we do not have the `And` in the code, and there would be implicit sign extension?

src/hotspot/share/opto/addnode.cpp line 828:

> 826: };
> 827: 
> 828: typedef GrowableArray<MergeLoadInfo*> MergeLoadInfoList;

Do you need to have this Resource allocated? Why not just have the elements in-place? And then make the `MergeLoadInfo` a `StackObj`. That would remove the extra pointer indirection.

src/hotspot/share/opto/addnode.cpp line 843:

> 841:   Node*     const    _combine;
> 842:   MemoryAdjacentStatus _order;
> 843:   bool _require_reverse_bytes;    // Do we need add a ReverseBytes for merged load

Suggestion:

  bool _require_reverse_bytes;    // Do we need to add a ReverseBytes for merged load

src/hotspot/share/opto/addnode.cpp line 857:

> 855: 
> 856: private:
> 857:   // Detect the embedding combine node is a candidate for merging loads

Suggestion:

  // Detect if the embedding combine node is a candidate for merging loads

-------------

Changes requested by epeter (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/24023#pullrequestreview-2775153032
PR Review Comment: https://git.openjdk.org/jdk/pull/24023#discussion_r2048591407
PR Review Comment: https://git.openjdk.org/jdk/pull/24023#discussion_r2048598807
PR Review Comment: https://git.openjdk.org/jdk/pull/24023#discussion_r2048600811
PR Review Comment: https://git.openjdk.org/jdk/pull/24023#discussion_r2048602890


More information about the hotspot-compiler-dev mailing list