RFR: 8340602: C2: LoadNode::split_through_phi might exhaust nodes in case of base_is_phi [v6]

Fri Oct 18 09:36:56 UTC 2024

On Thu, 17 Oct 2024 02:33:58 GMT, Daohan Qu <dqu at openjdk.org> wrote:

>> # Description 
>> 
>> [JDK-6934604](https://github.com/openjdk/jdk/commit/b4977e887a53c898b96a7d37a3bf94742c7cc194) introduces the flag `AggressiveUnboxing` in jdk8u, and [JDK-8217919](https://github.com/openjdk/jdk/commit/71759e3177fcd6926bb36a30a26f9f3976f2aae8) enables it by default in jdk13u.
>> 
>> But it seems that JDK-6934604 forgets to check duplicate `PhiNode` generated in the function `LoadNode::split_through_phi(PhaseGVN *phase)` (in `memnode.cpp`) in the case that `base` is phi but `mem` is not phi. More exactly, `LoadNode::Identity(PhaseTransform *phase)` doesn't search for `PhiNode` in the correct region in that case.
>> 
>> This might cause infinite split in the case of a loop, which is similar to the bugs fixed in [JDK-6673473](https://github.com/openjdk/jdk/commit/30dc0edfc877000c0ae20384f228b45ba82807b7). The infinite split results in "Out of nodes" and make the method "not compilable".
>> 
>> Since JDK-8217919 (in jdk13u), all the later versions of jdks are affected by this bug when the expected optimization pattern appears in the code. For example, the following three micro-benchmarks running with
>> 
>> 
>> make test \
>>   TEST="micro:org.openjdk.bench.java.util.stream.tasks.IntegerMax.Bulk.bulk_seq_inner micro:org.openjdk.bench.java.util.stream.tasks.IntegerMax.Lambda.bulk_seq_lambda micro:org.openjdk.bench.java.util.stream.tasks.IntegerMax.Lambda.bulk_seq_methodRef" \
>>   TEST_OPTS="VM_OPTIONS=-XX:+UseParallelGC"
>> 
>> 
>>  shows performance improvement after this PR applied. (`-XX:+UseParallelGC` is only for reproduce this bug, all the bms in the following table are run with this option.)
>> 
>> |benchmark (throughput, unit: ops/s)| jdk-before-this-patch | jdk-after-this-patch |
>> |---|---|---|
>> |org.openjdk.bench.java.util.stream.tasks.IntegerMax.Bulk.bulk_seq_inner | 26.678 ±(99.9%) 0.574 ops/s | 55.692 ±(99.9%) 4.419 ops/s |
>> |org.openjdk.bench.java.util.stream.tasks.IntegerMax.Lambda.bulk_seq_lambda | 26.792 ±(99.9%) 1.924 ops/s | 64.882 ±(99.9%) 4.175 ops/s |
>> |org.openjdk.bench.java.util.stream.tasks.IntegerMax.Lambda.bulk_seq_methodRef | 27.023 ±(99.9%) 1.116 ops/s | 66.313 ±(99.9%) 0.802 ops/s |
>> 
>> # Reproduction
>> 
>> Compiled and run the reduced test case `Test.java` in the appendix below using 
>> 
>> 
>> java -Xbatch -XX:+UnlockDiagnosticVMOptions -XX:+LogCompilation -XX:LogFile=comp.log -XX:+UseParallelGC Test
>> 
>> 
>> and you could find that `Test$Obj.calc` is tagged with `make_not_compilable` and see some output like 
>> 
>> 
>> <fai...
>
> Daohan Qu has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Bug fix

It still works for the cases where `mem->in(0) == base->in(0)`.

It seems that the code that splits through base phi in [`LoadNode::split_through_phi()`](https://github.com/openjdk/jdk/blob/b4977e887a53c898b96a7d37a3bf94742c7cc194/hotspot/src/share/vm/opto/memnode.cpp#L1284) is moved from [`LoadNode::eliminate_autobox()`](https://github.com/openjdk/jdk/blob/7c367a6025f519bf12b5b57c807470555eb0a673/hotspot/src/share/vm/opto/memnode.cpp#L1187) in the commit https://github.com/openjdk/jdk/commit/b4977e887a53c898b96a7d37a3bf94742c7cc194 . And `mem->in(0) == base->in(0)` is what the original code requires:
https://github.com/openjdk/jdk/blob/7c367a6025f519bf12b5b57c807470555eb0a673/hotspot/src/share/vm/opto/memnode.cpp#L1205-L1240

-------------

PR Comment: https://git.openjdk.org/jdk/pull/21134#issuecomment-2421969464