RFR: 8345485: C2 MergeLoads: merge adjacent array/native memory loads into larger load [v12]
Emanuel Peter
epeter at openjdk.org
Thu Apr 17 09:44:04 UTC 2025
On Mon, 7 Apr 2025 02:24:03 GMT, kuaiwei <duke at openjdk.org> wrote:
>> In this patch, I extent the merge stores optimization to merge adjacents loads. Tier1 tests are passed in my machine.
>>
>> The benchmark result of MergeLoadBench.java
>> AMD EPYC 9T24 96-Core Processor:
>>
>> |name | -MergeLoads | +MergeLoads |delta|
>> |---|---|---|---|
>> |MergeLoadBench.getCharB |4352.150 |4407.435 | 55.29 |
>> |MergeLoadBench.getCharBU |4075.320 |4084.663 | 9.34 |
>> |MergeLoadBench.getCharBV |3221.302 |3221.528 | 0.23 |
>> |MergeLoadBench.getCharC |2235.433 |2238.796 | 3.36 |
>> |MergeLoadBench.getCharL |4363.244 |4372.281 | 9.04 |
>> |MergeLoadBench.getCharLU |4072.550 |4075.744 | 3.19 |
>> |MergeLoadBench.getCharLV |2227.825 |2231.612 | 3.79 |
>> |MergeLoadBench.getIntB |11199.935 |6869.030 | -4330.90 |
>> |MergeLoadBench.getIntBU |6853.862 |2763.923 | -4089.94 |
>> |MergeLoadBench.getIntBV |306.953 |309.911 | 2.96 |
>> |MergeLoadBench.getIntL |10426.843 |6523.716 | -3903.13 |
>> |MergeLoadBench.getIntLU |6740.847 |2602.701 | -4138.15 |
>> |MergeLoadBench.getIntLV |2233.151 |2231.745 | -1.41 |
>> |MergeLoadBench.getIntRB |11335.756 |8980.619 | -2355.14 |
>> |MergeLoadBench.getIntRBU |7439.873 |3190.208 | -4249.66 |
>> |MergeLoadBench.getIntRL |16323.040 |7786.842 | -8536.20 |
>> |MergeLoadBench.getIntRLU |7457.745 |3364.140 | -4093.61 |
>> |MergeLoadBench.getIntRU |2512.621 |2511.668 | -0.95 |
>> |MergeLoadBench.getIntU |2501.064 |2500.629 | -0.43 |
>> |MergeLoadBench.getLongB |21175.442 |21103.660 | -71.78 |
>> |MergeLoadBench.getLongBU |14042.046 |2512.784 | -11529.26 |
>> |MergeLoadBench.getLongBV |606.448 |606.171 | -0.28 |
>> |MergeLoadBench.getLongL |23142.178 |23217.785 | 75.61 |
>> |MergeLoadBench.getLongLU |14112.972 |2237.659 | -11875.31 |
>> |MergeLoadBench.getLongLV |2230.416 |2231.224 | 0.81 |
>> |MergeLoadBench.getLongRB |21152.558 |21140.583 | -11.98 |
>> |MergeLoadBench.getLongRBU |14031.178 |2520.317 | -11510.86 |
>> |MergeLoadBench.getLongRL |23248.506 |23136.410 | -112.10 |
>> |MergeLoadBench.getLongRLU |14125.032 |2240.481 | -11884.55 |
>> |Merg...
>
> kuaiwei has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 14 commits:
>
> - Merge remote-tracking branch 'origin/master' into dev/merge_loads
> - Remove unused code
> - Move code to addnode.cpp and add more tests
> - Merge remote-tracking branch 'origin/master' into dev/merge_loads
> - Fix test
> - Add more tests
> - Enable StressIGVN and riscv platform
> - Change tests as review comments
> - Fix test failure and change for review comments
> - Revert extract value and add more tests
> - ... and 4 more: https://git.openjdk.org/jdk/compare/660b17a6...f6518b26
First batch, need to change trains, I'll continue later :)
src/hotspot/share/opto/addnode.cpp line 800:
> 798: *
> 799: * Because array value is logical and with constant 0xff/0xffff, LoadS/LoadB is converted to an unsigned load
> 800: * and 'And' node is eliminated in previous IGVN phase. Please check AndINode::Ideal for reference
What do you mean by `array value is logical`?
Do you have tests where we do not have the `And` in the code, and there would be implicit sign extension?
src/hotspot/share/opto/addnode.cpp line 828:
> 826: };
> 827:
> 828: typedef GrowableArray<MergeLoadInfo*> MergeLoadInfoList;
Do you need to have this Resource allocated? Why not just have the elements in-place? And then make the `MergeLoadInfo` a `StackObj`. That would remove the extra pointer indirection.
src/hotspot/share/opto/addnode.cpp line 843:
> 841: Node* const _combine;
> 842: MemoryAdjacentStatus _order;
> 843: bool _require_reverse_bytes; // Do we need add a ReverseBytes for merged load
Suggestion:
bool _require_reverse_bytes; // Do we need to add a ReverseBytes for merged load
src/hotspot/share/opto/addnode.cpp line 857:
> 855:
> 856: private:
> 857: // Detect the embedding combine node is a candidate for merging loads
Suggestion:
// Detect if the embedding combine node is a candidate for merging loads
-------------
Changes requested by epeter (Reviewer).
PR Review: https://git.openjdk.org/jdk/pull/24023#pullrequestreview-2775153032
PR Review Comment: https://git.openjdk.org/jdk/pull/24023#discussion_r2048591407
PR Review Comment: https://git.openjdk.org/jdk/pull/24023#discussion_r2048598807
PR Review Comment: https://git.openjdk.org/jdk/pull/24023#discussion_r2048600811
PR Review Comment: https://git.openjdk.org/jdk/pull/24023#discussion_r2048602890
More information about the hotspot-compiler-dev
mailing list