RFR: 8345485: C2 MergeLoads: merge adjacent array/native memory loads into larger load [v4]
Emanuel Peter
epeter at openjdk.org
Tue Mar 18 08:37:15 UTC 2025
On Tue, 18 Mar 2025 07:37:12 GMT, kuaiwei <duke at openjdk.org> wrote:
>> In this patch, I extent the merge stores optimization to merge adjacents loads. Tier1 tests are passed in my machine.
>>
>> The benchmark result of MergeLoadBench.java
>> AMD EPYC 9T24 96-Core Processor:
>>
>> |name | -MergeLoads | +MergeLoads |delta|
>> |---|---|---|---|
>> |MergeLoadBench.getCharB |4352.150 |4407.435 | 55.29 |
>> |MergeLoadBench.getCharBU |4075.320 |4084.663 | 9.34 |
>> |MergeLoadBench.getCharBV |3221.302 |3221.528 | 0.23 |
>> |MergeLoadBench.getCharC |2235.433 |2238.796 | 3.36 |
>> |MergeLoadBench.getCharL |4363.244 |4372.281 | 9.04 |
>> |MergeLoadBench.getCharLU |4072.550 |4075.744 | 3.19 |
>> |MergeLoadBench.getCharLV |2227.825 |2231.612 | 3.79 |
>> |MergeLoadBench.getIntB |11199.935 |6869.030 | -4330.90 |
>> |MergeLoadBench.getIntBU |6853.862 |2763.923 | -4089.94 |
>> |MergeLoadBench.getIntBV |306.953 |309.911 | 2.96 |
>> |MergeLoadBench.getIntL |10426.843 |6523.716 | -3903.13 |
>> |MergeLoadBench.getIntLU |6740.847 |2602.701 | -4138.15 |
>> |MergeLoadBench.getIntLV |2233.151 |2231.745 | -1.41 |
>> |MergeLoadBench.getIntRB |11335.756 |8980.619 | -2355.14 |
>> |MergeLoadBench.getIntRBU |7439.873 |3190.208 | -4249.66 |
>> |MergeLoadBench.getIntRL |16323.040 |7786.842 | -8536.20 |
>> |MergeLoadBench.getIntRLU |7457.745 |3364.140 | -4093.61 |
>> |MergeLoadBench.getIntRU |2512.621 |2511.668 | -0.95 |
>> |MergeLoadBench.getIntU |2501.064 |2500.629 | -0.43 |
>> |MergeLoadBench.getLongB |21175.442 |21103.660 | -71.78 |
>> |MergeLoadBench.getLongBU |14042.046 |2512.784 | -11529.26 |
>> |MergeLoadBench.getLongBV |606.448 |606.171 | -0.28 |
>> |MergeLoadBench.getLongL |23142.178 |23217.785 | 75.61 |
>> |MergeLoadBench.getLongLU |14112.972 |2237.659 | -11875.31 |
>> |MergeLoadBench.getLongLV |2230.416 |2231.224 | 0.81 |
>> |MergeLoadBench.getLongRB |21152.558 |21140.583 | -11.98 |
>> |MergeLoadBench.getLongRBU |14031.178 |2520.317 | -11510.86 |
>> |MergeLoadBench.getLongRL |23248.506 |23136.410 | -112.10 |
>> |MergeLoadBench.getLongRLU |14125.032 |2240.481 | -11884.55 |
>> |Merg...
>
> kuaiwei has updated the pull request incrementally with one additional commit since the last revision:
>
> Revert extract value and add more tests
@kuaiwei Thanks for working on this!
I had a quick look at the IR tests and left a few comments already. Additionally, it would be good if you had some "chaotic" tests, i.e. ones where the loads are reordered but could still be merged, and some that should not be merged.
I'll keep reviewing now...
test/hotspot/jtreg/compiler/c2/TestMergeLoads.java line 44:
> 42: * @run main compiler.c2.TestMergeLoads aligned
> 43: *
> 44: * @requires os.arch != "riscv64" | vm.cpu.features ~= ".*zbb.*"
Can you remove this global requirement, so that those platforms can at least do result verification?
You can always add restrictions to the `@IR` rules.
test/hotspot/jtreg/compiler/c2/TestMergeLoads.java line 55:
> 53: byte[] aB = new byte[RANGE];
> 54: char[] aC = new char[RANGE];
> 55: short[] aS = new short[RANGE];
What about merging loads on `int[]`?
test/hotspot/jtreg/compiler/c2/TestMergeLoads.java line 69:
> 67: switch (args[0]) {
> 68: case "aligned" -> { framework.addFlags("-XX:-UseUnalignedAccesses"); }
> 69: case "unaligned" -> { framework.addFlags("-XX:+UseUnalignedAccesses"); }
Can you please also add an explicit run with `StressIGVN`? Because the flag is not whitelisted for the TestFramework, and so if it was set from the outside, the IR rules would not be executed. But it would be nice that your algorithm is stable to reorderings in IGVN ;)
test/hotspot/jtreg/compiler/c2/TestMergeLoads.java line 145:
> 143: a[i] = (short)RANDOM.nextInt();
> 144: }
> 145: }
Suggestion:
}
Please put a space between methods ;)
test/hotspot/jtreg/compiler/c2/TestMergeLoads.java line 235:
> 233: }
> 234: }
> 235: }
In the meantime, I've developed a `Verify.java`, exactly for this. Would you mind using it, it would reduce the amount of code here quite a bit ;)
test/hotspot/jtreg/compiler/c2/TestMergeLoads.java line 290:
> 288: int[] ret = {i1};
> 289: return new Object[]{ret};
> 290: }
Ah, I see you are using `|`. Can we also use `+`?
FYI: if you use `Verify.java`, then you could directly do `return i2;`. It would get returned as a boxed `Integer`, and should be compared that way, without allocating an `int[]` and `Object[]`.
test/hotspot/jtreg/compiler/c2/TestMergeLoads.java line 423:
> 421: @IR(counts = {IRNode.LOAD_I_OF_CLASS, "byte\\\\[int:>=0] \\\\(java/lang/Cloneable,java/io/Serializable\\\\)", "1"},
> 422: applyIf = {"UseUnalignedAccesses", "true"},
> 423: applyIfPlatform = {"big-endian", "true"})
Can you please also check for byte and char loads? It would make sure that we do not have any loads that we are not expecting, and that the graph was cleaned appropriately.
-------------
PR Review: https://git.openjdk.org/jdk/pull/24023#pullrequestreview-2693406473
PR Review Comment: https://git.openjdk.org/jdk/pull/24023#discussion_r2000438853
PR Review Comment: https://git.openjdk.org/jdk/pull/24023#discussion_r2000448210
PR Review Comment: https://git.openjdk.org/jdk/pull/24023#discussion_r2000446641
PR Review Comment: https://git.openjdk.org/jdk/pull/24023#discussion_r2000453897
PR Review Comment: https://git.openjdk.org/jdk/pull/24023#discussion_r2000455701
PR Review Comment: https://git.openjdk.org/jdk/pull/24023#discussion_r2000459936
PR Review Comment: https://git.openjdk.org/jdk/pull/24023#discussion_r2000465656
More information about the hotspot-compiler-dev
mailing list