RFR: 8347901: C2 should remove unused leaf / pure runtime calls

Tue May 20 03:29:54 UTC 2025

On Wed, 30 Apr 2025 13:18:33 GMT, Marc Chevalier <mchevalier at openjdk.org> wrote:

> A first part toward a better support of pure functions.
> 
> ## Pure Functions
> 
> Pure functions (considered here) are functions that have no side effects, no effect on the control flow (no exception or such), cannot deopt etc.. It's really a function that you can execute anywhere, with whichever arguments without effect other than wasting time. Integer division is not pure as dividing by zero is throwing. But many floating point functions will just return `NaN` or `+/-infinity` in problematic cases.
> 
> ## Scope
> 
> We are not going all powerful for now! It's mostly about identifying some pure functions and being able to remove them if the result is unused. Some other things are not part of this PR, on purpose. Especially, this PR doesn't propose a way to move pure calls around. The reason is that pure calls are macro nodes later expanded into other, regular calls, which require a control input. To be able to do the expansion, we just keep the control in the pure call as well.
> 
> ## Implementation Overview
> 
> We created here some new node kind for pure calls that are expanded into regular calls during macro expansion. This also allows the removal of `ModD` and `ModF` nodes that have their pure equivalent now. They are surprisingly hard to unify with other floating point functions from an implementation point of view!
> 
> IR framework and IGV needed a little bit of fixing.
> 
> Thanks,
> Marc

I'm just pointing out that delaying lowering decision till matching phase neither makes scheduling easier nor makes implementation simpler.

For loop opts it is important to know when loops contain calls and act accordingly (by trying to hoist relevant nodes out of loops and disabling some optimizations when the calls are still there).

The difference between CFG nodes effectively pinned AT some point and non-CFG nodes with control dependency (effectively pushing them UNDER their control input) becomes insignificant once CFG nodes depend solely on control. In other words, once a call node doesn't consume/produce memory and I/O states, it becomes straightforward to move it around in CFG when desired (between it's inputs and users). 

Speaking of scheduling, would default scheduling heuristics do a good job? The case of expensive nodes exemplifies the need of custom scheduling heuristics for such nodes. 

Implementation-wise, lowering during matching becomes platform-specific and requires each platform to introduce `effect(CALL)`  AD instructions. Moreover, each call shape (determined by arity and argument kinds) has to be explicitly handled with a dedicated AD instruction. And it doesn't benefit from existing support of call nodes every platform already has.

> Ideally, what we want to do with expensive data nodes is to common them aggressively like any other data node. Then, during code motion, we can clone them if it is beneficial.

The current implementation of expensive nodes can definitely be improved, but the nice property it has is that it only decreases the number of nodes through careful commoning during loop opts. Once cloning is allowed, there's a new problem to care about: the case of too many clones. 

A simple incremental improvement would be to teach `PhaseIdealLoop::process_expensive_nodes()` to push expensive nodes closer to their users if they are on less frequent code paths. Then it can be taught (how and when) to clone expensive nodes between multiple users.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/24966#issuecomment-2892797262