RFR: 8347901: C2 should remove unused leaf / pure runtime calls

Mon May 19 07:24:01 UTC 2025

On Thu, 15 May 2025 21:56:32 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote:

>> A first part toward a better support of pure functions.
>> 
>> ## Pure Functions
>> 
>> Pure functions (considered here) are functions that have no side effects, no effect on the control flow (no exception or such), cannot deopt etc.. It's really a function that you can execute anywhere, with whichever arguments without effect other than wasting time. Integer division is not pure as dividing by zero is throwing. But many floating point functions will just return `NaN` or `+/-infinity` in problematic cases.
>> 
>> ## Scope
>> 
>> We are not going all powerful for now! It's mostly about identifying some pure functions and being able to remove them if the result is unused. Some other things are not part of this PR, on purpose. Especially, this PR doesn't propose a way to move pure calls around. The reason is that pure calls are macro nodes later expanded into other, regular calls, which require a control input. To be able to do the expansion, we just keep the control in the pure call as well.
>> 
>> ## Implementation Overview
>> 
>> We created here some new node kind for pure calls that are expanded into regular calls during macro expansion. This also allows the removal of `ModD` and `ModF` nodes that have their pure equivalent now. They are surprisingly hard to unify with other floating point functions from an implementation point of view!
>> 
>> IR framework and IGV needed a little bit of fixing.
>> 
>> Thanks,
>> Marc
>
> Interesting! I wasn't aware ADLC already features such support. Thanks for the pointers. 
> 
> It does look attractive, especially for platform-specific use cases. But there are some pitfalls which makes it hard to use on its own. In particular, data nodes are aggressively commoned and freely flow in the graph. Unless it is taken into account during GVN and code motion, the final schedule may end up far from optimal. (In other words, it's highly beneficial to match only expensive nodes in such a way.) Moreover, some optimizations are highly sensitive to the presence of calls. (Think of the consequences of a call scheduled inside a heavily vectorized loop.)
> 
> Macro-expansion also suffers from some of those issues, but still IMO an explicit `Call` node is a more appropriate solution to the problem.

Tbh I don't understand @iwanowww arguments. We have expensive data nodes such as `SqrtD` that have control inputs to prevent them floating too aggressively. Additionally, a `CallNode` is pinned AT its control input, while a data node is pinned UNDER its control input. It gives the scheduler much more freedom scheduling a data node to a better location compared to a call node.

Ideally, what we want to do with expensive data nodes is to common them aggressively like any other data node. Then, during code motion, we can clone them if it is beneficial.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/24966#issuecomment-2889891820