RFR: 8345485: C2 MergeLoads: merge adjacent array/native memory loads into larger load [v12]
Emanuel Peter
epeter at openjdk.org
Thu Apr 17 10:16:58 UTC 2025
On Mon, 7 Apr 2025 02:24:03 GMT, kuaiwei <duke at openjdk.org> wrote:
>> In this patch, I extent the merge stores optimization to merge adjacents loads. Tier1 tests are passed in my machine.
>>
>> The benchmark result of MergeLoadBench.java
>> AMD EPYC 9T24 96-Core Processor:
>>
>> |name | -MergeLoads | +MergeLoads |delta|
>> |---|---|---|---|
>> |MergeLoadBench.getCharB |4352.150 |4407.435 | 55.29 |
>> |MergeLoadBench.getCharBU |4075.320 |4084.663 | 9.34 |
>> |MergeLoadBench.getCharBV |3221.302 |3221.528 | 0.23 |
>> |MergeLoadBench.getCharC |2235.433 |2238.796 | 3.36 |
>> |MergeLoadBench.getCharL |4363.244 |4372.281 | 9.04 |
>> |MergeLoadBench.getCharLU |4072.550 |4075.744 | 3.19 |
>> |MergeLoadBench.getCharLV |2227.825 |2231.612 | 3.79 |
>> |MergeLoadBench.getIntB |11199.935 |6869.030 | -4330.90 |
>> |MergeLoadBench.getIntBU |6853.862 |2763.923 | -4089.94 |
>> |MergeLoadBench.getIntBV |306.953 |309.911 | 2.96 |
>> |MergeLoadBench.getIntL |10426.843 |6523.716 | -3903.13 |
>> |MergeLoadBench.getIntLU |6740.847 |2602.701 | -4138.15 |
>> |MergeLoadBench.getIntLV |2233.151 |2231.745 | -1.41 |
>> |MergeLoadBench.getIntRB |11335.756 |8980.619 | -2355.14 |
>> |MergeLoadBench.getIntRBU |7439.873 |3190.208 | -4249.66 |
>> |MergeLoadBench.getIntRL |16323.040 |7786.842 | -8536.20 |
>> |MergeLoadBench.getIntRLU |7457.745 |3364.140 | -4093.61 |
>> |MergeLoadBench.getIntRU |2512.621 |2511.668 | -0.95 |
>> |MergeLoadBench.getIntU |2501.064 |2500.629 | -0.43 |
>> |MergeLoadBench.getLongB |21175.442 |21103.660 | -71.78 |
>> |MergeLoadBench.getLongBU |14042.046 |2512.784 | -11529.26 |
>> |MergeLoadBench.getLongBV |606.448 |606.171 | -0.28 |
>> |MergeLoadBench.getLongL |23142.178 |23217.785 | 75.61 |
>> |MergeLoadBench.getLongLU |14112.972 |2237.659 | -11875.31 |
>> |MergeLoadBench.getLongLV |2230.416 |2231.224 | 0.81 |
>> |MergeLoadBench.getLongRB |21152.558 |21140.583 | -11.98 |
>> |MergeLoadBench.getLongRBU |14031.178 |2520.317 | -11510.86 |
>> |MergeLoadBench.getLongRL |23248.506 |23136.410 | -112.10 |
>> |MergeLoadBench.getLongRLU |14125.032 |2240.481 | -11884.55 |
>> |Merg...
>
> kuaiwei has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 14 commits:
>
> - Merge remote-tracking branch 'origin/master' into dev/merge_loads
> - Remove unused code
> - Move code to addnode.cpp and add more tests
> - Merge remote-tracking branch 'origin/master' into dev/merge_loads
> - Fix test
> - Add more tests
> - Enable StressIGVN and riscv platform
> - Change tests as review comments
> - Fix test failure and change for review comments
> - Revert extract value and add more tests
> - ... and 4 more: https://git.openjdk.org/jdk/compare/660b17a6...f6518b26
Even more coming later...
src/hotspot/share/opto/addnode.cpp line 816:
> 814: LoadNode* const _load;
> 815: Node* const _combine;
> 816: int const _shift;
Suggestion:
jint const _shift;
I prefer using `jint` etc when we are talking about values that correlate to java types. With the C/C++ numerical types, there could always be issues on different platforms with different bit sizes.
src/hotspot/share/opto/addnode.cpp line 823:
> 821: void dump() {
> 822: tty->print_cr("MergeLoadInfo: load: %d, combine: %d, shift: %d",
> 823: _load->_idx, _combine->_idx, _shift);
It would be nice if you also printed the Node name. e.g. `_load->Name()`. For me that is often more helpful.
src/hotspot/share/opto/addnode.cpp line 912:
> 910: return false;
> 911: }
> 912: return true;
This feels inverted. I would do a switch case here anyway, and return `false` in the default case.
Hmm. LoadS is supposed to be implicitly sign extended. Can those upper sign bits not lead to issues when OR-ing without a mask?
src/hotspot/share/opto/addnode.cpp line 916:
> 914:
> 915: bool MergePrimitiveLoads::is_supported_combine_opcode(int opc) {
> 916: return opc == Op_OrI || opc == Op_OrL;
Ah, and here you do it "positively". I would also recommend using a switch case, it is nicer to extend later for AND and XOR.
src/hotspot/share/opto/addnode.cpp line 920:
> 918:
> 919: // Go through ConvI2L which is unique output of input node
> 920: const Node* MergePrimitiveLoads::by_pass_i2l(const Node* n) {
Suggestion:
const Node* MergePrimitiveLoads::bypass_ConvI2L(const Node* n) {
I think `bypass` is a single verb, not two wordy ;)
src/hotspot/share/opto/addnode.cpp line 933:
> 931: *
> 932: * Load -> OrI/OrL
> 933: * It has no LShift usage and the And node is optimized out in previous optimization
This is a bit confusing... I thought what you wrote further up was much better, where you also had the ascii art. This here feels like a duplication of that up there. If there is anything here that is missing up there, then I would suggest that you just move it up there.
src/hotspot/share/opto/addnode.cpp line 946:
> 944: * | ((UNSAFE.getByte(array, address + 3) & 0xff) << 24);
> 945: */
> 946: bool MergePrimitiveLoads::is_merged_load_candidate() const {
What is your definition of candidate here? It seems to have something to do about not having a `LShift`, why? Maybe you can give a good definition here or somewhere else?
src/hotspot/share/opto/addnode.cpp line 950:
> 948: const Node* check = by_pass_i2l(_combine);
> 949: if (check->outcnt() == 1 && check->unique_out()->Opcode() == _combine->Opcode()) {
> 950: // It's in the middle of combine operators
Ah, I see. This is about checking that there is nothing further down to merge with....
But are you sure this is a good idea to check that there is no `OR` below? I mean there could be valid uses of `OrI` below that do some OR-ing with something else... Now you are forbidding these cases.
Look at this:
int x = (... merge load pattern with OR ...);
int y = ... some other value ...
int z = x | y;
I would expect that we could merge the loads here too, but your pattern matching seems to forbid it, right?
-------------
Changes requested by epeter (Reviewer).
PR Review: https://git.openjdk.org/jdk/pull/24023#pullrequestreview-2775212830
PR Review Comment: https://git.openjdk.org/jdk/pull/24023#discussion_r2048628508
PR Review Comment: https://git.openjdk.org/jdk/pull/24023#discussion_r2048631063
PR Review Comment: https://git.openjdk.org/jdk/pull/24023#discussion_r2048634906
PR Review Comment: https://git.openjdk.org/jdk/pull/24023#discussion_r2048635926
PR Review Comment: https://git.openjdk.org/jdk/pull/24023#discussion_r2048656876
PR Review Comment: https://git.openjdk.org/jdk/pull/24023#discussion_r2048641180
PR Review Comment: https://git.openjdk.org/jdk/pull/24023#discussion_r2048647672
PR Review Comment: https://git.openjdk.org/jdk/pull/24023#discussion_r2048652111
More information about the hotspot-compiler-dev
mailing list