RFR: 8289943: Simplify some object allocation merges [v13]

Wed Oct 26 23:07:01 UTC 2022

On Thu, 6 Oct 2022 16:50:28 GMT, Cesar Soares Lucas <cslucas at openjdk.org> wrote:

>> Hi there, can I please get some feedback on this approach to simplify object allocation merges in order to promote Scalar Replacement of the objects involved in the merge?
>> 
>> The basic idea for this [approach was discussed in this thread](https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2022-April/055189.html) and it consists of: 
>> 1) Identify Phi nodes that merge object allocations and replace them with a new IR node called ReducedAllocationMergeNode (RAM node).
>> 2) Scalar Replace the incoming allocations to the RAM node.
>> 3) Scalar Replace the RAM node itself.
>> 
>> There are a few conditions for doing the replacement of the Phi by a RAM node though - Although I plan to work on removing them in subsequent PRs:
>> 
>> - The only supported users of the original Phi are AddP->Load, SafePoints/Traps, DecodeN.
>> 
>> These are the critical parts of the implementation and I'd appreciate it very much if you could tell me if what I implemented isn't violating any C2 IR constraints:
>> 
>> - The way I identify/use the memory edges that will be used to find the last stored values to the merged object fields.
>> - The way I check if there is an incoming Allocate node to the original Phi node.
>> - The way I check if there is no store to the merged objects after they are merged.
>> 
>> Testing:
>> - Windows/Linux/MAC fastdebug/release 
>>   - hotspot_all
>>   - tier1
>>   - Renaissance
>>   - dacapo
>>   - new IR-based tests
>
> Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Fix x86 tests.

Thanks for the clarifications, Cesar.

The concept of RAM node still looks like a hack to me. And I'd like the patch to better fit the overall design of EA rather than trying to workaround different peculiarities of the current implementation.

I'll try to elaborate why I see RAMs redundant.

As of now, it serves dual purpose. It (1) marks a merge point as safe to be untangled during SR; and (2) caches information about field values.

I believe you can solve it in a cleaner manner without introducing placeholder
nodes and connection graph adjustments. IMO it's all about keeping escape 
status and properly handling "safe" merges in split_unique_types.

One possible way to handle merge points is:
  * Handle merge points in adjust_scalar_replaceable_state and refrain from marking relevant bases as NSR when possible.
  * After adjust_scalar_replaceable_state is over, every merge point should have all its inputs as either NSR or SR. 
  * split_unique_types incrementally builds value phis to eventually replace the base phi at merge point while processing SR allocations one by one.
  * After split_unique_types is done, there are no merge points anymore, each allocation has a dedicated memory graph and allocation elimination can proceed as before.

Do you see any problems with such an approach?

One thing still confuses me though: the patch mentions that RAMs can merge both eliminated and not-yet-eliminated allocations. What's the intended use case? I believe it's still required to have all merged allocations to be eventually eliminated. Do you try to handle the case during allocation elimination when part of the inputs are already eliminated and the rest is pending their turn?

-------------

PR: https://git.openjdk.org/jdk/pull/9073