RFR: 8333334: C2: Make result of `Node::dominates` more precise to enhance scalar replacement [v2]
MaxXing
duke at openjdk.org
Thu Jun 6 07:32:39 UTC 2024
On Wed, 5 Jun 2024 05:40:12 GMT, Tobias Hartmann <thartmann at openjdk.org> wrote:
>> MaxXing has updated the pull request incrementally with one additional commit since the last revision:
>>
>> Revert last commit, and push the `LoadNode` back to the worklist to wait for the dead code to be removed.
>
> Impressive results! I haven't looked at the change yet but here are a few questions / requests:
> - Could you add a screenshot of the IR of the case you are describing?
> - Wouldn't it help to add the LoadNode back to the IGVN worklist and wait for the dead path to be removed?
> - Could you add an [IR framework](https://github.com/openjdk/jdk/blob/master/test/hotspot/jtreg/compiler/lib/ir_framework/README.md) test that verifies that the optimization works as expected?
>
> Thanks,
> Tobias
@TobiHartmann Hi Tobias, thanks for your reply!
> Could you add a screenshot of the IR of the case you are describing?
Sure. Here's a simple example of iterating over the keys of `ConcurrentHashMap`:
public long sumMapKeys() {
long sum = 0;
Enumeration it = map.keys();
while (it.hasMoreElements()) {
sum += (Long) it.nextElement();
}
return sum;
}
And here's what `-XX:+PrintEscapeAnalysis -XX:+PrintEliminateAllocations` says:
JavaObject(6) NoEscape(NoEscape) [ 183F 189F 205F 196F 191F 179F 186F 202F 208F 233F 637F 1069F [ 104 109 ]] 92 Allocate === 76 6 69 8 1 (90 89 24 1 1 1 22 1 1 43 77 87 ) [[ 93 94 95 102 103 104 ]] rawptr:NotNull ( int:>=0, java/lang/Object:NotNull *, bool, top, bool ) ConcurrentHashMap::keys @ bci:16 (line 2152) MyBenchmark::sumMapKeys @ bci:6 (line 105) !jvms: ConcurrentHashMap::keys @ bci:16 (line 2152) MyBenchmark::sumMapKeys @ bci:6 (line 105)
LocalVar(60) [ 92P [ 109 183b 189b 205b ]] 104 Proj === 92 [[ 105 109 183 189 205 ]] #5 !jvms: ConcurrentHashMap::keys @ bci:16 (line 2152) MyBenchmark::sumMapKeys @ bci:6 (line 105)
LocalVar(103) [ 104 92P [ 196b 191b 179b 186b 202b 208b 233b 637b 1069b ]] 109 CheckCastPP === 106 104 [[ 1801 1771 1754 1584 1503 1503 1490 1490 1466 1466 1451 1451 208 1393 1381 1381 179 179 186 186 208 196 191 191 196 202 202 228 297 233 233 1363 1363 1393 1321 1321 1311 637 637 648 648 998 988 1311 1301 477 477 487 487 497 497 569 1301 672 685 767 672 978 920 539 539 1069 1069 557 557 1128 569 631 1061 685 631 ]] #java/util/concurrent/ConcurrentHashMap$KeyIterator (java/util/Iterator,java/util/Enumeration):NotNull:exact * Oop:java/util/concurrent/ConcurrentHashMap$KeyIterator (java/util/Iterator,java/util/Enumeration):NotNull:exact * !jvms: ConcurrentHashMap::keys @ bci:16 (line 2152) MyBenchmark::sumMapKeys @ bci:6 (line 105)
NotScalar (Field load) 109 CheckCastPP === 106 104 [[ 1801 1771 1754 1584 1503 1503 1490 1490 1128 569 1451 1451 208 1393 1381 1381 672 685 1069 1069 208 196 191 191 196 978 685 631 297 767 672 569 1301 1393 1321 1321 1311 557 557 631 1061 998 988 1311 1301 477 477 487 487 497 497 ]] #java/util/concurrent/ConcurrentHashMap$KeyIterator (java/util/Iterator,java/util/Enumeration):NotNull:exact *,iid=92 Oop:java/util/concurrent/ConcurrentHashMap$KeyIterator (java/util/Iterator,java/util/Enumeration):NotNull:exact *,iid=92 !jvms: ConcurrentHashMap::keys @ bci:16 (line 2152) MyBenchmark::sumMapKeys @ bci:6 (line 105)
>>>> 2186 LoadI === _ 1973 191 [[ 2185 ]] @java/util/concurrent/ConcurrentHashMap$Traverser+12 *, name=index, idx=9; #int !orig=[2178],[2171],[1373] !jvms: ConcurrentHashMap$Traverser::advance @ bci:51 (line 3369) ConcurrentHashMap$KeyIterator::next @ bci:28 (line 3468) ConcurrentHashMap$KeyIterator::nextElement @ bci:1 (line 3472) MyBenchmark::sumMapKeys @ bci:21 (line 107)
It shows that scalar replacement is aborted due to field load 2186:
<img width="909" alt="ir-diff" src="https://github.com/openjdk/jdk/assets/5129820/2e35769a-6e0f-4c8a-96ec-09923a5eb2c0">
As we can see its memory is a Phi, and it should be split by `LoadNode::split_through_phi` if its address `109 CheckCastPP` dominates the control flow of the Phi node `1330 Region`.
The control node of `CheckCastPP` is `106 Proj`:
<img width="360" alt="cast" src="https://github.com/openjdk/jdk/assets/5129820/ebf6f7f8-fcd1-4d83-8734-3f1864dcc946">
And it does dominate 1330, although not that obvious:
<img width="960" alt="cfg" src="https://github.com/openjdk/jdk/assets/5129820/e53b25de-c966-4f89-b30e-54ccc35f645b">
But `Node::dominates` think it doesn't because of the dead control flow.
> Wouldn't it help to add the LoadNode back to the IGVN worklist and wait for the dead path to be removed?
I tried to revert the change of main algorithm of `Node::dominates`, and just add some code to add the LoadNode back to the worklist if we met dead path. It works, still makes the iteration ~30% faster:
Benchmark (nkeys) Mode Cnt Score Error Units
Maps.testConcurrentHashMapIterators 10000 avgt 15 312720.415 ± 3255.500 ns/op
Thanks for pointing this out. I updated this PR, and the latest commit is passing test tier1-4.
> Could you add an [IR framework](https://github.com/openjdk/jdk/blob/master/test/hotspot/jtreg/compiler/lib/ir_framework/README.md) test that verifies that the optimization works as expected?
I'm still trying to reproduce this case without using `ConcurrentHashMap`, but haven't found a way yet. Can I add an IR test that depends on `ConcurrentHashMap`? (It might need to be updated if the implementation of `ConcurrentHashMap` changes, I guess.)
-------------
PR Comment: https://git.openjdk.org/jdk/pull/19496#issuecomment-2151588662
More information about the hotspot-compiler-dev
mailing list