RFR: 8333334: C2: Make result of `Node::dominates` more precise to enhance scalar replacement [v3]
Tobias Hartmann
thartmann at openjdk.org
Thu Jun 13 05:39:12 UTC 2024
On Tue, 11 Jun 2024 02:23:36 GMT, MaxXing <duke at openjdk.org> wrote:
>> This patch changes the algorithm of `Node::dominates` to make the result more precise, and allows the iterators of `ConcurrentHashMap` to be scalar replaced.
>>
>> The previous algorithm will return a conservative result when encountering a dead control flow, and only try the first two input paths of a multi-input Region node, which may prevent the scalar replacement in some cases.
>>
>> For example, with G1 GC enabled, C2 generates GC barriers for `ConcurrentHashMap` iteration operations at some early phases, and then eliminates them in a later IGVN, but `LoadNode` is also idealized in the same IGVN. This causes `LoadNode::Ideal` to see some dead barrier control flows, and refuse to split some instance field loads through Phi due to the conservative result of `Node::dominates`, and thus the scalar replacement can not be applied to iterators in the later macro elimination phase.
>>
>> This patch allows `Node::dominates` to try other paths of the last multi-input Region node when the first path is dead, and makes `ConcurrentHashMap` iteration ~30% faster:
>>
>>
>> Benchmark (nkeys) Mode Cnt Score Error Units
>> Maps.testConcurrentHashMapIterators 10000 avgt 15 414099.085 ± 33230.945 ns/op # baseline
>> Maps.testConcurrentHashMapIterators 10000 avgt 15 315490.281 ± 3037.056 ns/op # patch
>>
>>
>> Testing: tier1-4.
>
> MaxXing has updated the pull request incrementally with one additional commit since the last revision:
>
> Add IR test and update copyright.
> I tried to revert the change of main algorithm of Node::dominates, and just add some code to add the LoadNode back to the worklist if we met dead path. It works, still makes the iteration ~30% faster
That's great. Are we sure that what we detect as "dead code" can always be removed by IGVN? Otherwise, we risk an endless loop of putting the LoadNode back on the worklist and finding the dead code again, right?
> I'm still trying to reproduce this case without using ConcurrentHashMap, but haven't found a way yet. Can I add an IR test that depends on ConcurrentHashMap? (It might need to be updated if the implementation of ConcurrentHashMap changes, I guess.)
I think it would be great to have a standalone test, did you try with `-XX:+StressIGVN` and or `-XX:+StressIncrementalInlining`?
-------------
PR Review: https://git.openjdk.org/jdk/pull/19496#pullrequestreview-2114745795
More information about the hotspot-compiler-dev
mailing list