RFR: 8347901: C2 should remove unused leaf / pure runtime calls

Mon May 19 07:01:52 UTC 2025

On Tue, 13 May 2025 03:12:29 GMT, Quan Anh Mai <qamai at openjdk.org> wrote:

>>> I think a very simple approach you can take is having CallPureNode as a pure data node
>> 
>> It's not as simple as it seems. In order to work reliably it requires full control of the code being called, so without extra work it is appropriate for generated stubs only. If you want to call some native code VM doesn't control, then either all caller-saved registers should be preserved across the call (which may be prohibitively expensive) or it should be made explicit there's a call taking place so all ABI effects are taken into account.
>
> @iwanowww I believe `effect(CALL)` marks that a call is taking place and the register allocator will know how to save the registers accordingly. Note that on arm, long division is implemented as a call:
> 
> https://github.com/openjdk/jdk/blob/adebfa7ffda6383f5793278ced14a193066c5f6a/src/hotspot/cpu/arm/arm.ad#L5962
> 
> And `SharedRuntime::ldiv` is implemented in C++:
> 
> https://github.com/openjdk/jdk/blob/adebfa7ffda6383f5793278ced14a193066c5f6a/src/hotspot/share/runtime/sharedRuntime.cpp#L272

I like @merykitty's suggestion, but I don't understand how bad are the disadvantages of it. Commoning can be prevented as you mentioned above. As for scheduling, isn't it the same problem for many nodes? If we have something like

var x = anOject.aField;  // anObject known to be not null
if (flag) {  // flag independent of `anObject`
  // something with x
} else {
  // [...] nothing with x
}

I don't think there is any ordering between the if and the definition of `x`, and so we should push the latter under the if. And conversely, if the declaration is already in the branch in the original code, we should not let it float above. Or in case of loop, we should rather put it outside as much as possible. But none of that seems enforced by edges: memory node is not a CFG node, the nodes if the `if(flag)` might not use memory (so no memory edges)... The same would be true for an arithmetic node (like `AddI`, for instance), but we could argue those are cheap (even if in a loop, cheap becomes expensive), while a memory access is not that cheap.
So, don't the problems we have with @merykitty's pure-call-as-pure-data-node suggestion already exist for other node kinds? And if we would have troubles with scheduling of pure calls, shouldn't we have this kind of issue already?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/24966#issuecomment-2889840427