RFR: 8289943: Simplify some object allocation merges [v11]
Vladimir Kozlov
kvn at openjdk.org
Wed Oct 5 22:47:26 UTC 2022
On Wed, 5 Oct 2022 21:43:15 GMT, Cesar Soares Lucas <cslucas at openjdk.org> wrote:
>> Hi there, can I please get some feedback on this approach to simplify object allocation merges in order to promote Scalar Replacement of the objects involved in the merge?
>>
>> The basic idea for this [approach was discussed in this thread](https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2022-April/055189.html) and it consists of:
>> 1) Identify Phi nodes that merge object allocations and replace them with a new IR node called ReducedAllocationMergeNode (RAM node).
>> 2) Scalar Replace the incoming allocations to the RAM node.
>> 3) Scalar Replace the RAM node itself.
>>
>> There are a few conditions for doing the replacement of the Phi by a RAM node though - Although I plan to work on removing them in subsequent PRs:
>>
>> - The only supported users of the original Phi are AddP->Load, SafePoints/Traps, DecodeN.
>>
>> These are the critical parts of the implementation and I'd appreciate it very much if you could tell me if what I implemented isn't violating any C2 IR constraints:
>>
>> - The way I identify/use the memory edges that will be used to find the last stored values to the merged object fields.
>> - The way I check if there is an incoming Allocate node to the original Phi node.
>> - The way I check if there is no store to the merged objects after they are merged.
>>
>> Testing:
>> - Windows/Linux/MAC fastdebug/release
>> - hotspot_all
>> - tier1
>> - Renaissance
>> - dacapo
>> - new IR-based tests
>
> Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision:
>
> Address PR feedback. Fix test & one bug. Set RAM parameter to true by default.
DaCapo and Renaissance are good for testing it I think. That is where I see variations. May be we can try to use `-XX:-TieredCompilation` to see if using only C2 have effect.
It seems we don't have a lot of cases where this optimization helps. May for future work based on these benchmarks (and others) we can collect cases when this optimization does not work (or even bailout compilation).
BTW, were you able to remove all allocations in your test `run_IfElseInLoop()`?
What about test case in https://bugs.openjdk.org/browse/JDK-6853701
-------------
PR: https://git.openjdk.org/jdk/pull/9073
More information about the hotspot-compiler-dev
mailing list