RFR: 8345766: C2 should emit macro nodes for ModF/ModD instead of calls during parsing [v6]
Quan Anh Mai
qamai at openjdk.org
Sat Jan 11 02:32:35 UTC 2025
On Thu, 9 Jan 2025 14:35:58 GMT, Theo Weidmann <tweidmann at openjdk.org> wrote:
>> C2 currently emits runtime calls if the platform rules do not support lowering floating point remainder operations. For example, for float:
>>
>> https://github.com/openjdk/jdk/blob/fbbc7c35f422294090b8c7a02a19ab2fb67c7070/src/hotspot/share/opto/parse2.cpp#L2305-L2318
>>
>> https://github.com/openjdk/jdk/blob/fbbc7c35f422294090b8c7a02a19ab2fb67c7070/src/hotspot/share/opto/parse2.cpp#L1099-L1109
>>
>> The only platform, which currently supports this, however, is x86_32. On all other platforms, runtime calls are generated directly during parsing, which prevent any constant folding or other idealizations. Even C1 can perform these optimizations, resulting in significantly lower C2 performance compared to C1 for simple test cases. This function was observed to be around 15x slower with C2 compared to C1 due to redundant runtime calls:
>>
>>
>> public static double process(final double x) {
>> double w = (double) 0.1;
>> double p = 0;
>> p = (double) (3.109615012413746E307 % (w % Z));
>> p = (double) (7.614949555185036E307 / (x % x)); // <- return value only dependends on this line
>> return (double) (x * p);
>> }
>>
>>
>> To fix this, this PR turns ModFNode and ModDNode into macros, which are always created during parsing. They support idealization (constant folding) and are lowered to runtime calls during macro expansion. For simplicity, these operations will now also call into the runtime on x86_32, as this platform is deprecated.
>
> Theo Weidmann has updated the pull request incrementally with three additional commits since the last revision:
>
> - Address comments
> - Actually return top
> - Update divnode.cpp
Sorry for being late here. What do you think about using a general-purpose `CallPureNode` that represents a call not reading or writing external modifiable states? Apart from `ModF` and `ModD`, there are several other nodes that may benefit from this such as the trigonometric functions, svml calls, etc. A `CallPureNode` does not have input and output control or memory, which makes it more susceptible to GVN and deadcode elimination, as well as allowing it to be more freely scheduled.
-------------
PR Comment: https://git.openjdk.org/jdk/pull/22786#issuecomment-2585016091
More information about the hotspot-compiler-dev
mailing list