RFR: 8327963: C2: fix construction of memory graph around Initialize node to prevent incorrect execution if allocation is removed [v11]
John R Rose
jrose at openjdk.org
Fri Jul 11 18:00:44 UTC 2025
On Fri, 6 Jun 2025 09:01:46 GMT, Roland Westrelin <roland at openjdk.org> wrote:
>> An `Initialize` node for an `Allocate` node is created with a memory
>> `Proj` of adr type raw memory. In order for stores to be captured, the
>> memory state out of the allocation is a `MergeMem` with slices for the
>> various object fields/array element set to the raw memory `Proj` of
>> the `Initialize` node. If `Phi`s need to be created during later
>> transformations from this memory state, The `Phi` for a particular
>> slice gets its adr type from the type of the `Proj` which is raw
>> memory. If during macro expansion, the `Allocate` is found to have no
>> use and so can be removed, the `Proj` out of the `Initialize` is
>> replaced by the memory state on input to the `Allocate`. A `Phi` for
>> some slice for a field of an object will end up with the raw memory
>> state on input to the `Allocate` node. As a result, memory state at
>> the `Phi` is incorrect and incorrect execution can happen.
>>
>> The fix I propose is, rather than have a single `Proj` for the memory
>> state out of the `Initialize` with adr type raw memory, to use one
>> `Proj` per slice added to the memory state after the `Initalize`. Each
>> of the `Proj` should return the right adr type for its slice. For that
>> I propose having a new type of `Proj`: `NarrowMemProj` that captures
>> the right adr type.
>>
>> Logic for the construction of the `Allocate`/`Initialize` subgraph is
>> tweaked so the right adr type captured in is own `NarrowMemProj` is
>> added to the memory sugraph. Code that removes an allocation or moves
>> it also has to be changed so it correctly takes the multiple memory
>> projections out of the `Initialize` node into account.
>>
>> One tricky issue is that when EA split types for a scalar replaceable
>> `Allocate` node:
>>
>> 1- the adr type captured in the `NarrowMemProj` becomes out of sync
>> with the type of the slices for the allocation
>>
>> 2- before EA, the memory state for one particular field out of the
>> `Initialize` node can be used for a `Store` to the just allocated
>> object or some other. So we can have a chain of `Store`s, some to
>> the newly allocated object, some to some other objects, all of them
>> using the state of `NarrowMemProj` out of the `Initialize`. After
>> split unique types, the `NarrowMemProj` is for the slice of a
>> particular allocation. So `Store`s to some other objects shouldn't
>> use that memory state but the memory state before the `Allocate`.
>>
>> For that, I added logic to update the adr type of `NarrowMemProj`
>> during split uni...
>
> Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision:
>
> more
> > I still think it would be good to include test cases to confirm that these are not only theoretical concerns, but that should not block the progress of this PR.
>
> Here is a test case … where all the damage is done early on when EA runs. A pass of loop opts before EA fully unrolls the loop and creates memory `Phi`s with incorrect `adr_type` (raw memory). Then EA removes the allocation. All that keeps the `Store` to `field1` alive then is uncommon traps from template predicates. Once they are removed, the `Store` goes away (first round of loop opts after EA).
>
> I'll add that test case to the PR.
I think the moral of this story is: Any single compiler optimization must never depend on "always going first". If unrelated IR transforms do not commute, something will eventually go wrong, when some unpredictable source code requires (or requests) the fragile, non-commutative optimizations to happen in the "wrong order".
Roland, I deeply appreciate your previous comment about fixing root causes, and avoiding shiny workarounds that seem to make the bug go away. (Often the workarounds attempt to restrict the free commutation/reordering of optimizations, by adding contextual checks like extra pattern matching or phase sensistivity.) Such workarounds "seem to work" but usually in the end "fail to work" as well.
-------------
PR Comment: https://git.openjdk.org/jdk/pull/24570#issuecomment-3063227169
More information about the hotspot-compiler-dev
mailing list