RFR: 8327963: C2: fix construction of memory graph around Initialize node to prevent incorrect execution if allocation is removed [v12]
Roberto Castañeda Lozano
rcastanedalo at openjdk.org
Fri Sep 19 13:12:36 UTC 2025
On Tue, 9 Sep 2025 11:27:50 GMT, Roland Westrelin <roland at openjdk.org> wrote:
>> An `Initialize` node for an `Allocate` node is created with a memory
>> `Proj` of adr type raw memory. In order for stores to be captured, the
>> memory state out of the allocation is a `MergeMem` with slices for the
>> various object fields/array element set to the raw memory `Proj` of
>> the `Initialize` node. If `Phi`s need to be created during later
>> transformations from this memory state, The `Phi` for a particular
>> slice gets its adr type from the type of the `Proj` which is raw
>> memory. If during macro expansion, the `Allocate` is found to have no
>> use and so can be removed, the `Proj` out of the `Initialize` is
>> replaced by the memory state on input to the `Allocate`. A `Phi` for
>> some slice for a field of an object will end up with the raw memory
>> state on input to the `Allocate` node. As a result, memory state at
>> the `Phi` is incorrect and incorrect execution can happen.
>>
>> The fix I propose is, rather than have a single `Proj` for the memory
>> state out of the `Initialize` with adr type raw memory, to use one
>> `Proj` per slice added to the memory state after the `Initalize`. Each
>> of the `Proj` should return the right adr type for its slice. For that
>> I propose having a new type of `Proj`: `NarrowMemProj` that captures
>> the right adr type.
>>
>> Logic for the construction of the `Allocate`/`Initialize` subgraph is
>> tweaked so the right adr type captured in is own `NarrowMemProj` is
>> added to the memory sugraph. Code that removes an allocation or moves
>> it also has to be changed so it correctly takes the multiple memory
>> projections out of the `Initialize` node into account.
>>
>> One tricky issue is that when EA split types for a scalar replaceable
>> `Allocate` node:
>>
>> 1- the adr type captured in the `NarrowMemProj` becomes out of sync
>> with the type of the slices for the allocation
>>
>> 2- before EA, the memory state for one particular field out of the
>> `Initialize` node can be used for a `Store` to the just allocated
>> object or some other. So we can have a chain of `Store`s, some to
>> the newly allocated object, some to some other objects, all of them
>> using the state of `NarrowMemProj` out of the `Initialize`. After
>> split unique types, the `NarrowMemProj` is for the slice of a
>> particular allocation. So `Store`s to some other objects shouldn't
>> use that memory state but the memory state before the `Allocate`.
>>
>> For that, I added logic to update the adr type of `NarrowMemProj`
>> during split uni...
>
> Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 45 commits:
>
> - more
> - Merge branch 'master' into JDK-8327963
> - more
> - more
> - Merge branch 'master' into JDK-8327963
> - more
> - more
> - lambda return
> - lambda clean up
> - Merge branch 'master' into JDK-8327963
> - ... and 35 more: https://git.openjdk.org/jdk/compare/e16c5100...b701d03e
Changes requested by rcastanedalo (Reviewer).
src/hotspot/share/opto/escape.hpp line 567:
> 565: // MemNode - new memory input for this node
> 566: // CheckCastPP - allocation that this is a cast of
> 567: // allocation - CheckCastPP of the allocation
Please add a new entry here explaining how `_node_map` is used for `NarrowMemProjNode` nodes.
src/hotspot/share/opto/graphKit.cpp line 3645:
> 3643: assert(minit_out->is_Proj() && minit_out->in(0) == init, "");
> 3644: int mark_idx = C->get_alias_index(oop_type->add_offset(oopDesc::mark_offset_in_bytes()));
> 3645: // Add an edge in the MergeMem for the header fields so an access to one of those has correct memory state
Suggestion:
// Add an edge in the MergeMem for the header fields so an access to one of those has correct memory state.
src/hotspot/share/opto/graphKit.cpp line 3647:
> 3645: // Add an edge in the MergeMem for the header fields so an access to one of those has correct memory state
> 3646: // Use one NarrowMemProjNode per slice to properly record the adr type of each slice. The Initialize node will have
> 3647: // multiple projection as a result.
Suggestion:
// multiple projections as a result.
src/hotspot/share/opto/macro.cpp line 1606:
> 1604: // elimination. Simply add the MemBarStoreStore after object
> 1605: // initialization.
> 1606: MemBarNode* mb = MemBarNode::make(C, Op_MemBarStoreStore, Compile::AliasIdxRaw);
Does the same argument as below apply for relaxing the scope of this memory barrier? Please clarify in a similar comment for this case (if the same argument applies, a reference to the comment below would be enough).
src/hotspot/share/opto/macro.cpp line 1623:
> 1621: Node* init_ctrl = init->proj_out_or_null(TypeFunc::Control);
> 1622:
> 1623: // What we want is to prevent the compiler and the cpu from re-ordering the stores that initialize this object
Suggestion:
// What we want is to prevent the compiler and the CPU from re-ordering the stores that initialize this object
src/hotspot/share/opto/macro.cpp line 1628:
> 1626: // only captures/produces a partial memory state making it complicated to insert such a MemBar. Because
> 1627: // re-ordering by the compiler can't happen by construction (a later Store that publishes the just allocated
> 1628: // object reference is indirectly control dependent on the Initialize node), preventing reordering by the cpu is
Suggestion:
// object reference is indirectly control dependent on the Initialize node), preventing reordering by the CPU is
src/hotspot/share/opto/memnode.hpp line 1383:
> 1381: bool already_has_narrow_mem_proj_with_adr_type(const TypePtr* adr_type) const;
> 1382:
> 1383: MachProjNode* mem_mach_proj() const;
Please add a brief comment above this function, possibly clarifying that we do not expect to find more than one Mach memory projection.
src/hotspot/share/opto/multnode.cpp line 73:
> 71: };
> 72: return apply_to_projs(filter, which_proj);
> 73: }
Consider moving this implementation to `multnode.hpp`, perhaps next to that of `MultiNode::apply_to_projs(DUIterator_Fast& imax, DUIterator_Fast& i, Callback callback, uint which_proj)`, for consistency.
src/hotspot/share/opto/multnode.cpp line 279:
> 277: void NarrowMemProjNode::dump_spec(outputStream *st) const {
> 278: ProjNode::dump_spec(st);
> 279: dump_adr_type(st);
Do we need to define a special version of `NarrowMemProjNode::dump_adr_type` or could we just have the same effect calling `MemNode::dump_adr_type(this, _adr_type, st)` here?
src/hotspot/share/opto/multnode.cpp line 284:
> 282: void NarrowMemProjNode::dump_compact_spec(outputStream *st) const {
> 283: ProjNode::dump_compact_spec(st);
> 284: dump_adr_type(st);
Same here.
src/hotspot/share/opto/multnode.hpp line 71:
> 69: }
> 70: Node* current() {
> 71: return _node->fast_out(_i);;
Suggestion:
return _node->fast_out(_i);
src/hotspot/share/opto/multnode.hpp line 90:
> 88: }
> 89: Node* current() {
> 90: return _node->out(_i);;
Suggestion:
return _node->out(_i);
src/hotspot/share/opto/phaseX.cpp line 2621:
> 2619: add_users_to_worklist0(proj, worklist);
> 2620: return MultiNode::CONTINUE;
> 2621: };
Consider defining `enqueue` only once and reusing it in both cases.
test/hotspot/jtreg/compiler/escapeAnalysis/TestIterativeEA.java line 53:
> 51: analyzer.shouldContain("++++ Eliminated: 26 Allocate");
> 52: analyzer.shouldContain("++++ Eliminated: 51 Allocate");
> 53: analyzer.shouldContain("++++ Eliminated: 84 Allocate");
Did you analyze why there are more allocations removed than before in this test case? I did not expect this changeset to have an effect on the number of removed allocations.
test/hotspot/jtreg/compiler/macronodes/TestEarlyEliminationOfAllocationWithoutUse.java line 1:
> 1: /*
Please add a package declaration (and make the corresponding class names fully qualified in the `@run` directives).
test/hotspot/jtreg/compiler/macronodes/TestEliminationOfAllocationWithoutUse.java line 30:
> 28: * Now that array slice depends on the rawslice. And then when the Initialize MemBar gets
> 29: * removed in expand_allocate_common, the rawslice sees that it has now no effect, looks
> 30: * through the MergeMem and sees the initial stae. That way, also the linked array slice
Suggestion:
* through the MergeMem and sees the initial state. That way, also the linked array slice
-------------
PR Review: https://git.openjdk.org/jdk/pull/24570#pullrequestreview-3244667543
PR Review Comment: https://git.openjdk.org/jdk/pull/24570#discussion_r2362830370
PR Review Comment: https://git.openjdk.org/jdk/pull/24570#discussion_r2362759304
PR Review Comment: https://git.openjdk.org/jdk/pull/24570#discussion_r2362760441
PR Review Comment: https://git.openjdk.org/jdk/pull/24570#discussion_r2362798596
PR Review Comment: https://git.openjdk.org/jdk/pull/24570#discussion_r2362800147
PR Review Comment: https://git.openjdk.org/jdk/pull/24570#discussion_r2362800934
PR Review Comment: https://git.openjdk.org/jdk/pull/24570#discussion_r2362782847
PR Review Comment: https://git.openjdk.org/jdk/pull/24570#discussion_r2362757140
PR Review Comment: https://git.openjdk.org/jdk/pull/24570#discussion_r2362743051
PR Review Comment: https://git.openjdk.org/jdk/pull/24570#discussion_r2362743403
PR Review Comment: https://git.openjdk.org/jdk/pull/24570#discussion_r2362746650
PR Review Comment: https://git.openjdk.org/jdk/pull/24570#discussion_r2362750245
PR Review Comment: https://git.openjdk.org/jdk/pull/24570#discussion_r2362767659
PR Review Comment: https://git.openjdk.org/jdk/pull/24570#discussion_r2362816473
PR Review Comment: https://git.openjdk.org/jdk/pull/24570#discussion_r2362810978
PR Review Comment: https://git.openjdk.org/jdk/pull/24570#discussion_r2362745517
More information about the hotspot-compiler-dev
mailing list