Question about the design of FrameState in Graal IR

Fri Feb 9 19:27:36 UTC 2024

Hi, Gergő, 

Thank you for the explanation. Those two papers also contain great contents about the IR design. 
Before your pointers, I just sensed that Graal IR decides to do that for a reason. It gives wiggle room for code motion, but I didn't see the full picture of it.  
For me, it's really hard to distill the idea by browsing the source code. I think I start to understand it with your help.  

Thanks,
--lx

On 2/8/24, 5:15 AM, "Gergö Barany" <gergo.barany at oracle.com <mailto:gergo.barany at oracle.com>> wrote:

CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.

On 08/02/2024 08:02, Liu, Xin wrote:
> In section 2 of CGO14 “Partial Escape Analysis and Scalar Replacement
> for Java”, there’s a sentence:
>
> “The Graal IR keeps the frame states not at the points where the actual
> deoptimizations take place, but at the points where side effects may
> occur.”
>
> The paper is great, but I can’t find the explanation of this design
> decision.

The following papers provide some background on this:
https://dl.acm.org/doi/10.1145/2542142.2542143 <https://dl.acm.org/doi/10.1145/2542142.2542143> and especially
https://dl.acm.org/doi/10.1145/2647508.2647521 <https://dl.acm.org/doi/10.1145/2647508.2647521>

> My intuitive idea is that we generate FrameState at the point where the
> deoptimization arises. This is the most accurate, isn’t it?

Yes, that is the most accurate. But in the Graal IR we are deliberately
inaccurate: When we deoptimize, we can restart interpreter execution
from an earlier point than the root cause of the deoptimization. This
means that the deoptimization slow path can reexecute some code that was
already executed in compiled code. This is not observable because such
code cannot contain side effects.

> [...]
> Obviously, Graal JIT manages to solve this problem. The PEA phrase can
> move allocations whatever it wants. So far, my understanding is that
> Graal IR hides those deoptimization nodes in high-tier.

It doesn't hide deoptimizations, but most of them are represented by
floating guard nodes that can themselves move around the graph and are
not constrained by frame states during the high tier.

> [...]
> If I am still on track, the problem boils down to how to generate the
> correct FrameState for those low-level deoptimization nodes. I guess
> this is why Graal IR designers save FrameStates “at the points where
> side-effect may occur”. The late nodes need anchor points. Either Graal
> generates a new deoptimization node or move someone to this point, it is
> easy to find the closest ‘FrameState” in reverse control flows, Like
> magnet.
>
> All above is my conjunction. Could Graal experts confirm my
> understanding? Or could someone point me a literature which describes
> the rationale of FrameState design?

There are a few state transitions within a Graal compilations that
change what rules apply. Rougly:

- In the high tier, most guards (null checks, bounds checks, etc.) are
floating nodes without frame states attached; frame states are
attached to side effects and control flow merges. PEA only looks at
this level of representation.
- At some point in the mid tier, we find the best places to anchor the
previously floating guards and expand them there to fixed nodes to
express "if (condition) deoptimize" operations. Frame states are still
at side effects. (See GuardLoweringPhase if interested.)
- At some later point in the mid tier, we perform frame state
assignment (FrameStateAssignmentPhase): Each deoptimization point gets
a frame state from the closest dominating side effect. Side effects no
longer have frame states attached. Different deoptimization points may
share the same frame state, which allows us to share deoptimization
metadata (see the second paper linked above).

I think your understanding is broadly correct. From the point of view of
(our implementation of) PEA you can mostly ignore these details since
PEA is designed to only run on the highest level representation.

Hope this helps,
Gergö