RFR: 8333334: C2: Make result of `Node::dominates` more precise to enhance scalar replacement [v6]

Damon Fenacci dfenacci at openjdk.org
Mon Jul 8 12:06:34 UTC 2024


On Mon, 8 Jul 2024 07:06:07 GMT, Qizheng Xing <qxing at openjdk.org> wrote:

>> This patch changes the algorithm of `Node::dominates` to make the result more precise, and allows the iterators of `ConcurrentHashMap` to be scalar replaced.
>> 
>> The previous algorithm will return a conservative result when encountering a dead control flow, and only try the first two input paths of a multi-input Region node, which may prevent the scalar replacement in some cases.
>> 
>> For example, with G1 GC enabled, C2 generates GC barriers for `ConcurrentHashMap` iteration operations at some early phases, and then eliminates them in a later IGVN, but `LoadNode` is also idealized in the same IGVN. This causes `LoadNode::Ideal` to see some dead barrier control flows, and refuse to split some instance field loads through Phi due to the conservative result of `Node::dominates`, and thus the scalar replacement can not be applied to iterators in the later macro elimination phase.
>> 
>> This patch allows `Node::dominates` to try other paths of the last multi-input Region node when the first path is dead, and makes `ConcurrentHashMap` iteration ~30% faster:
>> 
>> 
>> Benchmark                            (nkeys)  Mode  Cnt        Score       Error  Units
>> Maps.testConcurrentHashMapIterators    10000  avgt   15   414099.085 ± 33230.945  ns/op   # baseline
>> Maps.testConcurrentHashMapIterators    10000  avgt   15   315490.281 ±  3037.056  ns/op   # patch
>> 
>> 
>> Testing: tier1-4.
>
> Qizheng Xing has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Make `Node::dominates` and `MemNode::all_controls_dominate` returns an `enum`.

Nice improvement @MaxXSoft!

Quick note:  don't you want to add your copyright notice line to the files you've modified?

test/hotspot/jtreg/compiler/c2/irTests/scalarReplacement/ScalarReplacementWithGCBarrierTests.java line 107:

> 105:     @IR(phase = { CompilePhase.AFTER_PARSING }, counts = { IRNode.ALLOC, "1" })
> 106:     @IR(phase = { CompilePhase.INCREMENTAL_BOXING_INLINE }, counts = { IRNode.ALLOC, "2" })
> 107:     @IR(applyIf = { "UseG1GC", "true" }, phase = { CompilePhase.ITER_GVN_AFTER_ELIMINATION }, counts = { IRNode.ALLOC, "1" })

Your test checks for the number of allocations to return to one. However in your description and comments you talk about load nodes that aren't folded. What about adding another IR test to check the number of loads for completeness?

-------------

PR Review: https://git.openjdk.org/jdk/pull/19496#pullrequestreview-2163093383
PR Review Comment: https://git.openjdk.org/jdk/pull/19496#discussion_r1668499629


More information about the hotspot-compiler-dev mailing list