RFR: 8373453: C2 SuperWord: must handle load slices that have loads with different memory inputs

Tobias Hartmann thartmann at openjdk.org
Tue Jan 6 14:00:37 UTC 2026


On Mon, 5 Jan 2026 07:37:34 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

> In `VLoopMemorySlices::find_memory_slices`, we analyze the memory slices. In some cases, we only find loads in the slice, and no phi. So the memory input of the loads comes from before the loop. When I refactored the code, I made the assumption that all loads should have the same memory input. After all: any store before the loop would have to have happened before we enter the loop, and execute any loads from the loop. The assumption held for a long time, but now we have a counter example.
> 
> Summary: one load has its memory input optimized, the other is not put on the IGVN worklist again, and keeps the old memory input (even though in this case we could have optimized it just the same). Thus, both choices of memory input are correct, and the assumption of the assert does not hold.
> 
> Solution: Just bail out of auto vectorization if this assumption is violated. This is an edge case, and the assert has not been hit until the fuzzer found this example.
> Alternatives: we could track the multiple memory inputs, but that would be more effort to implement, and hard to test because it is difficult to create examples.
> 
> ---------------------------
> 
> **Details**
> 
> Below, look at `1145 LoadB` and `1131 LoadB`. One has memory input `Param 7` (initial program state), the other `711 Phi` (outer loop). Both loads are inside the `1147 CountedLoop`. But their states come from outside, both originally from `711 Phi`. But then `1145 LoadB` is optimized with `LoadBNode::Ideal` -> `LoadNode::Ideal` -> `LoadNode::split_through_phi`: it realizes that the backedge of the `711 Phi` only goes by the `593 CallStaticJava`, which cannot modify the `Byte::value` field of the `LoadB` (unless it was to use reflection, but that unlocks undefined behavior anyway, so it can be ignored). So it is ok to split through the phi, as the `Byte::value` cannot be modified during the outer loop.
> 
> <img width="1478" height="1356" alt="image" src="https://github.com/user-attachments/assets/2186f190-4198-4868-b903-d84269224d89" />
> 
> `1131 LoadB` could also do the same optimization, but it just does not end up on the IGVN worklist. The issue is that we don't have any adequate notification that goes down through the `MergeMem - Call - Proj` structure. We did not want to have that until now, because in theory we could have a long series of calls, and the traversals could become too expensive.

Looks good to me too. Great that we have a regression test for this rare case now.

-------------

Marked as reviewed by thartmann (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/29028#pullrequestreview-3630997120


More information about the hotspot-compiler-dev mailing list