RFR: 8340602: C2: LoadNode::split_through_phi might exhaust nodes in case of base_is_phi [v3]
Christian Hagedorn
chagedorn at openjdk.org
Wed Sep 25 10:28:39 UTC 2024
On Wed, 25 Sep 2024 07:11:18 GMT, Daohan Qu <dqu at openjdk.org> wrote:
>> # Description
>>
>> [JDK-6934604](https://github.com/openjdk/jdk/commit/b4977e887a53c898b96a7d37a3bf94742c7cc194) introduces the flag `AggressiveUnboxing` in jdk8u, and [JDK-8217919](https://github.com/openjdk/jdk/commit/71759e3177fcd6926bb36a30a26f9f3976f2aae8) enables it by default in jdk13u.
>>
>> But it seems that JDK-6934604 forgets to check duplicate `PhiNode` generated in the function `LoadNode::split_through_phi(PhaseGVN *phase)` (in `memnode.cpp`) in the case that `base` is phi but `mem` is not phi. More exactly, `LoadNode::Identity(PhaseTransform *phase)` doesn't search for `PhiNode` in the correct region in that case.
>>
>> This might cause infinite split in the case of a loop, which is similar to the bugs fixed in [JDK-6673473](https://github.com/openjdk/jdk/commit/30dc0edfc877000c0ae20384f228b45ba82807b7). The infinite split results in "Out of nodes" and make the method "not compilable".
>>
>> Since JDK-8217919 (in jdk13u), all the later versions of jdks are affected by this bug when the expected optimization pattern appears in the code. For example, the following three micro-benchmarks running with
>>
>>
>> make test \
>> TEST="micro:org.openjdk.bench.java.util.stream.tasks.IntegerMax.Bulk.bulk_seq_inner micro:org.openjdk.bench.java.util.stream.tasks.IntegerMax.Lambda.bulk_seq_lambda micro:org.openjdk.bench.java.util.stream.tasks.IntegerMax.Lambda.bulk_seq_methodRef" \
>> MICRO="FORK=1;WARMUP_ITER=2" \
>> TEST_OPTS="VM_OPTIONS=-XX:+UseParallelGC"
>>
>>
>> shows performance improvement after this PR applied. (`-XX:+UseParallelGC` is only for reproduce this bug, all the bms in the following table are run with this option.)
>>
>> |benchmark (throughput, unit: ops/s)| jdk-before-this-patch | jdk-after-this-patch |
>> |---|---|---|
>> |org.openjdk.bench.java.util.stream.tasks.IntegerMax.Bulk.bulk_seq_inner | 26.452 ±(99.9%) 0.185 ops/s | 62.060 ±(99.9%) 0.878 ops/s |
>> |org.openjdk.bench.java.util.stream.tasks.IntegerMax.Lambda.bulk_seq_lambda | 26.922 ±(99.9%) 1.710 ops/s | 67.961 ±(99.9%) 0.850 ops/s |
>> |org.openjdk.bench.java.util.stream.tasks.IntegerMax.Lambda.bulk_seq_methodRef | 28.382 ±(99.9%) 1.021 ops/s | 67.998 ±(99.9%) 0.751 ops/s |
>>
>> # Reproduction
>>
>> Compiled and run the reduced test case `Test.java` in the appendix below using
>>
>>
>> java -Xbatch -XX:+UnlockDiagnosticVMOptions -XX:+LogCompilation -XX:LogFile=comp.log -XX:+UseParallelGC Test
>>
>>
>> and you could find that `Test$Obj.calc` is tagged with `make_not_compilable` and...
>
> Daohan Qu has updated the pull request incrementally with one additional commit since the last revision:
>
> Add a jtreg test
Can you explain in more detail how we fail to cut the infinite splitting and how your patch now solves this?
I have some initial comments.
src/hotspot/share/opto/memnode.cpp line 1257:
> 1255: //----------------------is_instance_field_load_with_local_phi------------------
> 1256: bool LoadNode::is_instance_field_load_with_local_phi() {
> 1257: if( in(Memory)->is_Phi() && in(Address)->is_AddP() ) {
Was like that before but usually, we should fix the code style when touching old code like spacing
```suggestion
if (in(Memory)->is_Phi() && in(Address)->is_AddP()) {
or `*` placement (should be at type). There are some more places below which you could also fix with this patch.
src/hotspot/share/opto/memnode.cpp line 1303:
> 1301: }
> 1302:
> 1303: if (is_boxed_value_load_with_local_phi(phase)) {
Does `t_oop->is_ptr_to_boxed_value()` hold here? Should we assert this?
src/hotspot/share/opto/memnode.cpp line 1305:
> 1303: if (is_boxed_value_load_with_local_phi(phase)) {
> 1304: intptr_t ignore = 0;
> 1305: Node * base = AddPNode::Ideal_base_and_offset(in(Address), phase, ignore);
There was a null check before for `base`. Is this not required anymore?
test/hotspot/jtreg/compiler/loopopts/TestInfiniteSplitInCaseOfBaseIsPhi.java line 27:
> 25: * @test
> 26: * @bug 8340602
> 27: * @requires vm.compiler2.enabled
Since you run with Parallel GC, you should add a requires here:
Suggestion:
* @requires vm.compiler2.enabled & & vm.gc.Parallel
test/hotspot/jtreg/compiler/loopopts/TestInfiniteSplitInCaseOfBaseIsPhi.java line 38:
> 36: import java.util.Random;
> 37:
> 38: public class TestInfiniteSplitInCaseOfBaseIsPhi {
You should fix the indentation to 4 spaces for Java code instead of 2.
-------------
PR Review: https://git.openjdk.org/jdk/pull/21134#pullrequestreview-2327402828
PR Review Comment: https://git.openjdk.org/jdk/pull/21134#discussion_r1774709937
PR Review Comment: https://git.openjdk.org/jdk/pull/21134#discussion_r1774968342
PR Review Comment: https://git.openjdk.org/jdk/pull/21134#discussion_r1774960562
PR Review Comment: https://git.openjdk.org/jdk/pull/21134#discussion_r1774705779
PR Review Comment: https://git.openjdk.org/jdk/pull/21134#discussion_r1774707120
More information about the hotspot-compiler-dev
mailing list