RFR: 8345485: C2 MergeLoads: merge adjacent array/native memory loads into larger load [v4]
Emanuel Peter
epeter at openjdk.org
Tue Mar 18 09:05:17 UTC 2025
On Tue, 18 Mar 2025 07:37:12 GMT, kuaiwei <duke at openjdk.org> wrote:
>> In this patch, I extent the merge stores optimization to merge adjacents loads. Tier1 tests are passed in my machine.
>>
>> The benchmark result of MergeLoadBench.java
>> AMD EPYC 9T24 96-Core Processor:
>>
>> |name | -MergeLoads | +MergeLoads |delta|
>> |---|---|---|---|
>> |MergeLoadBench.getCharB |4352.150 |4407.435 | 55.29 |
>> |MergeLoadBench.getCharBU |4075.320 |4084.663 | 9.34 |
>> |MergeLoadBench.getCharBV |3221.302 |3221.528 | 0.23 |
>> |MergeLoadBench.getCharC |2235.433 |2238.796 | 3.36 |
>> |MergeLoadBench.getCharL |4363.244 |4372.281 | 9.04 |
>> |MergeLoadBench.getCharLU |4072.550 |4075.744 | 3.19 |
>> |MergeLoadBench.getCharLV |2227.825 |2231.612 | 3.79 |
>> |MergeLoadBench.getIntB |11199.935 |6869.030 | -4330.90 |
>> |MergeLoadBench.getIntBU |6853.862 |2763.923 | -4089.94 |
>> |MergeLoadBench.getIntBV |306.953 |309.911 | 2.96 |
>> |MergeLoadBench.getIntL |10426.843 |6523.716 | -3903.13 |
>> |MergeLoadBench.getIntLU |6740.847 |2602.701 | -4138.15 |
>> |MergeLoadBench.getIntLV |2233.151 |2231.745 | -1.41 |
>> |MergeLoadBench.getIntRB |11335.756 |8980.619 | -2355.14 |
>> |MergeLoadBench.getIntRBU |7439.873 |3190.208 | -4249.66 |
>> |MergeLoadBench.getIntRL |16323.040 |7786.842 | -8536.20 |
>> |MergeLoadBench.getIntRLU |7457.745 |3364.140 | -4093.61 |
>> |MergeLoadBench.getIntRU |2512.621 |2511.668 | -0.95 |
>> |MergeLoadBench.getIntU |2501.064 |2500.629 | -0.43 |
>> |MergeLoadBench.getLongB |21175.442 |21103.660 | -71.78 |
>> |MergeLoadBench.getLongBU |14042.046 |2512.784 | -11529.26 |
>> |MergeLoadBench.getLongBV |606.448 |606.171 | -0.28 |
>> |MergeLoadBench.getLongL |23142.178 |23217.785 | 75.61 |
>> |MergeLoadBench.getLongLU |14112.972 |2237.659 | -11875.31 |
>> |MergeLoadBench.getLongLV |2230.416 |2231.224 | 0.81 |
>> |MergeLoadBench.getLongRB |21152.558 |21140.583 | -11.98 |
>> |MergeLoadBench.getLongRBU |14031.178 |2520.317 | -11510.86 |
>> |MergeLoadBench.getLongRL |23248.506 |23136.410 | -112.10 |
>> |MergeLoadBench.getLongRLU |14125.032 |2240.481 | -11884.55 |
>> |Merg...
>
> kuaiwei has updated the pull request incrementally with one additional commit since the last revision:
>
> Revert extract value and add more tests
Ok, I gave it a quick first scan, and have some more questions and suggestions :)
src/hotspot/share/opto/memnode.cpp line 1853:
> 1851: * +---> Or2 <----+ |
> 1852: * | |
> 1853: * +-----> Or3 <------+
The code above has masking, the graph not. Can you add an explanatory comment, please ;)
src/hotspot/share/opto/memnode.cpp line 1855:
> 1853: * +-----> Or3 <------+
> 1854: *
> 1855: * It will be transformed as a merged LoadI and replace the Or3 node
Suggestion:
* It is transformed as a merged LoadI, which replaces the Or3 node.
src/hotspot/share/opto/memnode.cpp line 1862:
> 1860: /*
> 1861: * LoadNode and OrNode pair which represent an item for merging,
> 1862: * And we can get some properties like shift and last_op from it.
To me it seems that `load` and `shift` are the relevant elements here, and the `or` and `last_op` are the "side-info", only used if it is the last op. Is that correct?
src/hotspot/share/opto/memnode.cpp line 1877:
> 1875: _last_op(false) {}
> 1876: void set_last_op(bool v) { _last_op = v; }
> 1877: bool last_op() const { return _last_op; }
Is it feasible to make all fields `const`? It can often make reasoning about code easier if you know that there can be no modifications. Not sure about `_last_op`, I'll have to keep reading to find out.
src/hotspot/share/opto/memnode.cpp line 1896:
> 1894: LowToHigh, // Adjacent and first load access low address
> 1895: HighToLow, // Adjacent and first load access high address
> 1896: NotAdjacent // Not adjacent
What happens if an `OrNode` has its inputs swapped? This can happen if the node idx are the "wrong way around". See `commute`, comment `Otherwise, sort inputs (commutativity) to help value numbering`.
I don't know how likely this is to happen. What do you think?
src/hotspot/share/opto/memnode.cpp line 1902:
> 1900: PhaseGVN* const _phase;
> 1901: LoadNode* const _load;
> 1902: int _last_op_index; // Index of the last item in merged_list
What is the `merged_list`? I could not find it.
src/hotspot/share/opto/memnode.cpp line 1976:
> 1974: // Go through ConvI2L which is unique output of the load
> 1975: Node* MergePrimitiveLoads::by_pass_i2l(const LoadNode* l) {
> 1976: if ( l != nullptr && l->outcnt() == 1 && l->unique_out()->Opcode() == Op_ConvI2L) {
Suggestion:
if (l != nullptr && l->outcnt() == 1 && l->unique_out()->Opcode() == Op_ConvI2L) {
src/hotspot/share/opto/memnode.cpp line 1979:
> 1977: return l->unique_out();
> 1978: } else {
> 1979: return (Node*)l;
Hmm, I don't like casting away `const`... is there a way to avoid this?
src/hotspot/share/opto/memnode.cpp line 2522:
> 2520: MergePrimitiveLoads merge(phase, this);
> 2521: Node* merged = merge.run();
> 2522: if (merged != nullptr) { return merged; }
I'm a little confused here... So imagine we have a `LoadB` here. How can we now return a `LoadI` instead, and replace all uses of the `LoadB` with the `LoadI`? Should we not be replacing the `OrI` instead?
-------------
PR Review: https://git.openjdk.org/jdk/pull/24023#pullrequestreview-2693479230
PR Review Comment: https://git.openjdk.org/jdk/pull/24023#discussion_r2000489282
PR Review Comment: https://git.openjdk.org/jdk/pull/24023#discussion_r2000478996
PR Review Comment: https://git.openjdk.org/jdk/pull/24023#discussion_r2000495348
PR Review Comment: https://git.openjdk.org/jdk/pull/24023#discussion_r2000484074
PR Review Comment: https://git.openjdk.org/jdk/pull/24023#discussion_r2000505715
PR Review Comment: https://git.openjdk.org/jdk/pull/24023#discussion_r2000508791
PR Review Comment: https://git.openjdk.org/jdk/pull/24023#discussion_r2000516974
PR Review Comment: https://git.openjdk.org/jdk/pull/24023#discussion_r2000512944
PR Review Comment: https://git.openjdk.org/jdk/pull/24023#discussion_r2000526992
More information about the hotspot-compiler-dev
mailing list