RFR: 8325495: C2: implement optimization for series of Add of unique value [v2]

Mon Sep 30 06:46:38 UTC 2024

On Tue, 17 Sep 2024 09:33:38 GMT, Christian Hagedorn <chagedorn at openjdk.org> wrote:

>>> If this is your intention, then please ignore this message.
>> 
>> Yes, this is my intention.
>> 
>> --- 
>> 
>> My previous approach of identifying optimized `Mul->shift + add/sub` (e.g., `a*6` becomes `(a<<1) + (a<<2)` by `MulNode::Ideal()`) was inherently flawed. I was solely determining this with the number of terms. It is not reliable. In the `TestLargeTreeOfSubNodes` example, it replaces already optimized Mul nodes and a new Mul node and repeats the process, causing performance regression (and timeouts).
>> 
>> The new approach matches the exact patterns of optimized `MulNode`s. Additionally, a recursion depth limit of 5 (a rather arbitrary number) is in effect during *iterative* GVN to mitigate the risk of exhausting resources. Untransformed nodes are added to the worklist and will be eventually transformed.
>> 
>> Please note, in the case of `TestLargeTreeOfSubNodes` with flags mentioned above, the compilation is skipped without a large enough `-XX:MaxLabelRootDepth`. This is the same behaviour as the current master. 
>> 
>> Please re-review once GHA is confirmed passing. Thanks!
>
>> Please note, in the case of TestLargeTreeOfSubNodes with flags mentioned above, the compilation is skipped without a large enough -XX:MaxLabelRootDepth. This is the same behaviour as the current master.
> 
> Have you found out why this is the case? I thought that the original fix which added `TestLargeTreeOfSubNodes` wanted to fix the problem of running out of nodes.
> 
> I gave your patch another spin. We still see various failures and timeouts. For example:
> 
> `compiler/intrinsics/sha/TestDigest.java` times out with various flag combinations (for example `-server -Xmixed`). Here is the stack at the timeout:
> 
> 
> Thread 7 (Thread 0x7fc808490700 (LWP 22433)):
> #0  0x00007fc80d648051 in Node::find_integer_type(BasicType) const () from /opt/mach5/mesos/work_dir/jib-master/install/2024-09-17-0714032.christian.hagedorn.jdk-test/linux-x64-debug.jdk/jdk-24/fastdebug/lib/server/libjvm.so
> #1  0x00007fc80c793214 in AddNode::extract_base_operand_from_serial_additions(PhaseGVN*, Node*, Node**, int) () from /opt/mach5/mesos/work_dir/jib-master/install/2024-09-17-0714032.christian.hagedorn.jdk-test/linux-x64-debug.jdk/jdk-24/fastdebug/lib/server/libjvm.so
> #2  0x00007fc80c79306c in AddNode::extract_base_operand_from_serial_additions(PhaseGVN*, Node*, Node**, int) () from /opt/mach5/mesos/work_dir/jib-master/install/2024-09-17-0714032.christian.hagedorn.jdk-test/linux-x64-debug.jdk/jdk-24/fastdebug/lib/server/libjvm.so
> ...
> #90 0x00007fc80c79306c in AddNode::extract_base_operand_from_serial_additions(PhaseGVN*, Node*, Node**, int) () from /opt/mach5/mesos/work_dir/jib-master/install/2024-09-17-0714032.christian.hagedorn.jdk-test/linux-x64-debug.jdk/jdk-24/fastdebug/lib/server/libjvm.so
> #91 0x00007fc80c793082 in AddNode::extract_base_operand_from_serial_additions(PhaseGVN*, Node*, Node**, int) () from /opt/mach5/mesos/work_dir/jib-master/install/2024-09-17-0714032.christian.hagedorn.jdk-test/linux-x64-debug.jdk/jdk-24/fastdebug/lib/server/libjvm.so
> #92 0x00007fc80c79306c in AddNode::extract_base_operand_from_serial_additions(PhaseGVN*, Node*, Node**, int) () from /opt/mach5/mesos/work_dir/jib-master/install/2024-09-17-0714032.christian.hagedorn.jdk-test/linux-x64-debug.jdk/jdk-24/fastdebug/lib/server/libjvm.so
> #93 0x00007fc80c793351 in AddNode::convert_serial_additions(PhaseGVN*, bool, BasicType) () from /opt/mach5/mesos/work_dir/jib-master/install/2024-09-17-0714032.christian.hagedorn.jdk-test/linux-x64-debug.jdk/jdk-24/fastdebug/lib/server/libjvm.so
> #94 0x00007fc80c7937c5 in AddNode...

@chhagedorn 

> Have you found out why this is the case? I thought that the original fix which added TestLargeTreeOfSubNodes wanted to fix the problem of running out of nodes.

I think this is intended. The original fix does prevents running out of nodes during optimization. However, the compilation skipped during code gen, not IGVN. The optimization, after all, correctly unrolls into a large number of nodes which default value of 1100 for `MaxLabelRootDepth` prevents compilation.

> We still see various failures and timeouts. For example: `compiler/intrinsics/sha/TestDigest.java`

This happened because `node->is_Add()` is deceivingly true when `node` is an `Xor[I/L]Node`. I adopted Roland's suggestions, and it's now comparing opcodes directly. 

> I'm also seeing the live node limit assert with test `applications/ctw/modules/java_desktop.java`

There was a problem detecting optimized power-of-2 multiplication where the base term itself is also a `LShiftNode`. It's fixed now.

---

After some discussions with Roland, I decided to not to use recurssion at all to avoid risk of exhausting resource. The latest version matches the exact patterns of multiplications optimized into power-of-2 additions. This should be much safer. It passes all `TEST="tier1 hotspot_compiler tier1_compiler tier2_compiler tier3_compiler tier1_compiler_not_xcomp"` tests. Please give it another run. Thank you very much!

@rwestrel Could you please give it a quick look as well? Thanks!

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20754#issuecomment-2382240422