RFC : Approach to handle Allocation Merges in C2 Scalar Replacement
Vladimir Ivanov
vladimir.x.ivanov at oracle.com
Fri Feb 11 18:24:42 UTC 2022
> For this simple case we can teach C2's IGVN to split fields loads
> through Phi so that phi(p0, p1) is not used and allocations as well. We
> can do that because we know that allocations and phi do not escape.
Seems like split_unique_types() pass is a prerequisite to make such
transformations predictable on a larger scale. It simplifies the memory
graph and untangles instance-related memory operations from the rest of
the graph.
Otherwise, I believe wide memory states/merges would pose serious
problems. E.g., when a load is split through a phi, you need to pick
correct memory states for the new loads above the phi which quickly
becomes quite a challenging task.
Best regards,
Vladimir Ivanov
>
> public int ex1(boolean cond, int first, int second) {
> p0 = Allocate(...);
> ...
> p0.x = first;
> p0.y = second;
>
> if (cond) {
> p1 = Allocate(...);
> ...
> p1.x = second;
> p1.y = first;
> }
>
> p = phi(p0, p1) // unused and will be removed
>
>
> return phi(p0.x,p1.x) - phi(p0.y, p1.y);
> }
>
> Thanks,
> Vladimir K
>
> On 2/10/22 11:29 AM, Vladimir Ivanov wrote:
>> (BCCing hotspot-dev and moving the discussion to hotspot-compiler-dev.)
>>
>> Hi Cesar,
>>
>> Thanks for looking into enhancing EA.
>>
>> Overall, the proposal looks reasonable.
>>
>> I suggest to look more closely at split_unique_types().
>> It introduces a dedicated class of alias categories for fields of the
>> allocation being eliminated and clones memory graph. I don't see why
>> it shouldn't work for multiple allocations.
>>
>> Moreover, split_unique_types() will break if you start optimizing
>> multiple allocations at once. The notion of unique alias should be
>> adjusted and cover the union of unique aliases for all interacting
>> allocations.
>>
>> Seems like you need to enhance SR to work on non-intersecting clusters
>> of allocations.
>>
>> One thing to take care of: scalar replacement relies on
>> TypeOopPtr::instance_id().
>>
>> // If not InstanceTop or InstanceBot, indicates that this is
>> // a particular instance of this type which is distinct.
>> // This is the node index of the allocation node creating this
>> instance.
>> int _instance_id;
>>
>> It'll break when multiple allocations are in play.
>>
>> Best regards,
>> Vladimir Ivanov
>>
>> On 09.02.2022 04:45, Cesar Soares Lucas wrote:
>>> Hi there again!
>>>
>>> Can you please give me feedback on the following approach to at least
>>> partially
>>> address [1], the scalar replacement allocation merge issue?
>>>
>>> The problem that I am trying to solve arises when allocations are
>>> merged after a
>>> control flow split. The code below shows _one example_ of such a
>>> situation.
>>>
>>> public int ex1(boolean cond, int x, int y) {
>>> Point p = new Point(x, y);
>>> if (cond)
>>> p = new Point(y, x);
>>> // Allocations for p are merged here.
>>> return p.calc();
>>> }
>>>
>>> Assuming the method calls on "p" are inlined then the allocations
>>> will not
>>> escape the method. The C2 IR for this method will look like this:
>>>
>>> public int ex1(boolean cond, int first, int second) {
>>> p0 = Allocate(...);
>>> ...
>>> p0.x = first;
>>> p0.y = second;
>>>
>>> if (cond) {
>>> p1 = Allocate(...);
>>> ...
>>> p1.x = second;
>>> p1.y = first;
>>> }
>>>
>>> p = phi(p0, p1)
>>>
>>> return p.x - p.y;
>>> }
>>>
>>> However, one of the constraints implemented here [2], specifically
>>> the third
>>> one, will prevent the objects from being scalar replaced.
>>>
>>> The approach that I'm considering for solving the problem is to
>>> replace the Phi
>>> node `p = phi(p0, p1)` with new Phi nodes for each of the fields of
>>> the objects
>>> in the original Phi. The IR for `ex1` would look something like this
>>> after the
>>> transformation:
>>>
>>> public int ex1(boolean cond, int first, int second) {
>>> p0 = Allocate(...);
>>> ...
>>> p0.x = first;
>>> p0.y = second;
>>>
>>> if (cond) {
>>> p1 = Allocate(...);
>>> ...
>>> p1.x = second;
>>> p1.y = first;
>>> }
>>>
>>> pX = phi(first, second)
>>> pY = phi(second, first)
>>>
>>> return pX - pY;
>>> }
>>>
>>> I understand that this transformation might not be applicable for all
>>> cases and
>>> that it's not as simple as illustrated above. Also, it seems to me
>>> that much of
>>> what I'd have to implement is already implemented in other steps of
>>> the Scalar
>>> Replacement pipeline (which is a good thing). To work around these
>>> implementation details I plan to use as much of the existing code as
>>> possible.
>>> The algorithm for the transformation would be like this:
>>>
>>> split_phis(phi)
>>> # If output of phi escapes, or something uses its identity, etc
>>> # then we can't remove it. The conditions here might possible be
>>> the
>>> # same as the ones implemented in
>>> `PhaseMacroExpand::can_eliminate_allocation`
>>> if cant_remove_phi_output(phi)
>>> return ;
>>>
>>> # Collect a set of tuples(F,U) containing nodes U that uses field F
>>> # member of the object resulting from `phi`.
>>> fields_used = collect_fields_used_after_phi(phi)
>>>
>>> foreach field in fields_used
>>> producers = {}
>>>
>>> # Create a list with the last Store for each field "field"
>>> on the
>>> # scope of each of the Phi input objects.
>>> foreach o in phi.inputs
>>> # The function called below might re-use a lot of the
>>> code/logic in `PhaseMacroExpand::scalar_replacement`
>>> producers += last_store_to_o_field(0, field)
>>> # Create a new phi node whose inputs are the Store's to 'field'
>>> field_phi = create_new_phi(producers)
>>>
>>> update_consumers(field, field_phi)
>>>
>>> The implementation that I envisioned would be as a "pre-process" [3]
>>> step just
>>> after EA but before the constraint checks in
>>> `adjust_scalar_replaceable_state`
>>> [2]. If we agree that the overall Scalar Replacement implementation
>>> goes through
>>> the following major phases:
>>>
>>> 1. Identify the Escape Status of objects.
>>> 2. Adjust object Escape and/or Scalar Replacement status based
>>> on a set of constraints.
>>> 3. Make call to Split_unique_types [4].
>>> 4 Iterate over object and array allocations.
>>> 4.1 Check if allocation can be eliminated.
>>> 4.2 Perform scalar replacement. Replace uses of object in
>>> Safepoints.
>>> 4.3 Process users of CheckCastPP other than Safepoint: AddP,
>>> ArrayCopy and CastP2X.
>>>
>>> The transformation that I am proposing would change the overall flow
>>> to look
>>> like this:
>>>
>>> 1. Identify the Escape Status of objects.
>>> 2. ----> New: "Split phi functions" <----
>>> 2. Adjust object Escape and/or Scalar Replacement status based
>>> on a set of constraints.
>>> 3. Make call to Split_unique_types [14].
>>> 4 Iterate over object and array allocations.
>>> 4.1 ----> Moved to split_phi: "Check if allocation can be
>>> eliminated" <----
>>> 4.2 Perform scalar replacement. Replace uses of object in
>>> Safepoints.
>>> 4.3 Process users of CheckCastPP other than Safepoint: AddP,
>>> ArrayCopy and CastP2X.
>>>
>>> Please let me know what you think and thank you for taking the time
>>> to review
>>> this!
>>>
>>>
>>> Regards,
>>> Cesar
>>>
>>> Notes:
>>>
>>> [1] I am not sure yet how this approach will play with the case
>>> of a merge
>>> with NULL.
>>> [2]
>>> https://github.com/openjdk/jdk/blob/2f71a6b39ed6bb869b4eb3e81bc1d87f4b3328ff/src/hotspot/share/opto/escape.cpp#L1809
>>>
>>>
>>> [3] Another option would be to "patch" the current
>>> implementation to be able
>>> to handle the merges. I am not certain that the "patch"
>>> approach would be
>>> better, however, the "pre-process" approach is certainly
>>> much easier to test
>>> and more readable.
>>>
>>> [4] I cannot say I understand 100% the effects of executing
>>> split_unique_types(). Would the transformation that I am
>>> proposing need to
>>> be after the call to split_unique_types?
More information about the hotspot-compiler-dev
mailing list