RFR: JDK-8316991: Reduce nullable allocation merges

Tobias Hartmann thartmann at openjdk.org
Mon Oct 16 16:18:16 UTC 2023


On Tue, 19 Sep 2023 18:54:34 GMT, Cesar Soares Lucas <cslucas at openjdk.org> wrote:

> ### Description
> 
> Many, if not most, allocation merges (Phis) are nullable because they join object allocations with "NULL", or objects returned from method calls, etc. Please review this Pull Request that improves Reduce Allocation Merge implementation so that it can reduce at least some of these allocation merges.
> 
> Overall, the improvements are related to 1) making rematerialization of merges able to represent "NULL" objects, and 2) being able to reduce merges used by CmpP/N and CastPP. 
> 
> The approach to reducing CmpP/N and CastPP is pretty similar to that used in the `MemNode::split_through_phi` method: a clone of the node being split is added on each input of the Phi. I make use of `optimize_ptr_compare` and some type information to remove redundant CmpP and CastPP nodes. I added a bunch of ASCII diagrams illustrating what some of the more important methods are doing.
> 
> ### Benchmarking
> 
> **Note:** In some of these tests no reduction happens. I left them in to validate that no perf. regression happens in that case.
> **Note 2:** Marging of error was negligible.
> 
> | Benchmark                            |  No RAM (ms/op)  |   Yes RAM (ms/op) |
> |--------------------------------------|------------------|-------------------|
> | TestTrapAfterMerge                   |      19.515      |      13.386       |
> | TestArgEscape                        |      33.165      |      33.254       |
> | TestCallTwoSide                      |      70.547      |      69.427       |
> | TestCmpAfterMerge                    |      16.400      |       2.984       |
> | TestCmpMergeWithNull_Second          |      27.204      |      27.293       |
> | TestCmpMergeWithNull                 |       8.248      |       4.920       |
> | TestCondAfterMergeWithAllocate       |      12.890      |       5.252       |
> | TestCondAfterMergeWithNull           |       6.265      |       5.078       |
> | TestCondLoadAfterMerge               |      12.713      |       5.163       |
> | TestConsecutiveSimpleMerge           |      30.863      |       4.068       |
> | TestDoubleIfElseMerge                |      16.069      |       2.444       |
> | TestEscapeInCallAfterMerge           |      23.111      |      22.924       |
> | TestGlobalEscape                     |      14.459      |      14.425       |
> | TestIfElseInLoop                     |     246.061      |      42.786       |
> | TestLoadAfterLoopAlias               |      45.808      |      45.812       |
> | TestLoadAfterTrap                    |      28.370      |   ...

I didn't look at this in detail yet but submitted testing. I see the following failures.

`compiler/eliminateAutobox/TestByteBoxing.java` with `-ea -esa -XX:CompileThreshold=100 -XX:+UnlockExperimentalVMOptions -server -XX:-TieredCompilation`:


# A fatal error has been detected by the Java Runtime Environment:
#
#  Internal Error (/workspace/open/src/hotspot/share/opto/loopnode.cpp:2178), pid=951972, tid=951999
#  assert(inner->is_valid_counted_loop(T_INT) && inner->is_strip_mined()) failed: OuterStripMinedLoop should have been removed
#
# JRE version: Java(TM) SE Runtime Environment (22.0) (fastdebug build 22-internal-2023-10-03-0709344.tobias.hartmann.jdk2)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug 22-internal-2023-10-03-0709344.tobias.hartmann.jdk2, mixed mode, sharing, compressed oops, compressed class ptrs, g1 gc, linux-amd64)
# Problematic frame:
# V  [libjvm.so+0x128082c]  LoopNode::verify_strip_mined(int) const+0xcc

Current CompileTask:
C2:   1438  263 %  b        compiler.eliminateAutobox.TestByteBoxing::main @ 1358 (1805 bytes)

Stack: [0x00007f0efc9cb000,0x00007f0efcacb000],  sp=0x00007f0efcac57a0,  free space=1001k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  [libjvm.so+0x128082c]  LoopNode::verify_strip_mined(int) const+0xcc  (loopnode.cpp:2178)
V  [libjvm.so+0x1256ead]  PathFrequency::to(Node*)+0x70d  (loopPredicate.cpp:988)
V  [libjvm.so+0x1258b49]  PhaseIdealLoop::loop_predication_impl(IdealLoopTree*)+0x8e9  (loopPredicate.cpp:1462)
V  [libjvm.so+0x125989a]  IdealLoopTree::loop_predication(PhaseIdealLoop*)+0x9a  (loopPredicate.cpp:1536)
V  [libjvm.so+0x12a28d7]  PhaseIdealLoop::build_and_optimize()+0xf57  (loopnode.cpp:4582)
V  [libjvm.so+0x9ee7fb]  PhaseIdealLoop::optimize(PhaseIterGVN&, LoopOptsMode)+0x3ab  (loopnode.hpp:1114)
V  [libjvm.so+0x9e9db6]  Compile::Optimize()+0xdf6  (compile.cpp:2362)


`compiler/eliminateAutobox/TestByteBoxing.java` with  `-XX:+UnlockDiagnosticVMOptions -XX:-TieredCompilation -XX:+StressArrayCopyMacroNode -XX:+StressLCM -XX:+StressGCM -XX:+StressIGVN -XX:+StressCCP -XX:+StressMethodHandleLinkerInlining -XX:+StressCompiledExceptionHandlers`:


# A fatal error has been detected by the Java Runtime Environment:
#
#  Internal Error (workspace/open/src/hotspot/share/opto/loopnode.cpp:6035), pid=1353611, tid=1353627
#  Error: ShouldNotReachHere()
#
# JRE version: Java(TM) SE Runtime Environment (22.0) (fastdebug build 22-internal-2023-10-03-0709344.tobias.hartmann.jdk2)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug 22-internal-2023-10-03-0709344.tobias.hartmann.jdk2, mixed mode, sharing, compressed oops, compressed class ptrs, g1 gc, linux-amd64)
# Problematic frame:
# V  [libjvm.so+0x129062c]  PhaseIdealLoop::verify_strip_mined_scheduling(Node*, Node*)+0x26c

Current CompileTask:
C2:    547   68    b        compiler.eliminateAutobox.TestDoubleBoxing::sump (48 bytes)

Stack: [0x00007f1814966000,0x00007f1814a66000],  sp=0x00007f1814a60c20,  free space=1003k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  [libjvm.so+0x129062c]  PhaseIdealLoop::verify_strip_mined_scheduling(Node*, Node*)+0x26c  (loopnode.cpp:6035)
V  [libjvm.so+0x12a0fb0]  PhaseIdealLoop::build_loop_late_post_work(Node*, bool)+0x420  (loopnode.cpp:6222)
V  [libjvm.so+0x12a166d]  PhaseIdealLoop::build_loop_late(VectorSet&, Node_List&, Node_Stack&)+0xbd  (loopnode.cpp:6045)
V  [libjvm.so+0x12a1f9d]  PhaseIdealLoop::build_and_optimize()+0x61d  (loopnode.cpp:4461)
V  [libjvm.so+0x9ee7fb]  PhaseIdealLoop::optimize(PhaseIterGVN&, LoopOptsMode)+0x3ab  (loopnode.hpp:1114)
V  [libjvm.so+0x9e9498]  Compile::Optimize()+0x4d8  (compile.cpp:2354)


Same failures with other tests in `compiler/eliminateAutobox/`

`compiler/intrinsics/unsafe/AllocateUninitializedArray.java` with `-XX:-TieredCompilation -XX:+AlwaysIncrementalInline`:


# A fatal error has been detected by the Java Runtime Environment:
#
#  Internal Error (/workspace/open/src/hotspot/share/opto/narrowptrnode.cpp:84), pid=2114638, tid=2114665
#  assert(t != TypeNarrowKlass::NULL_PTR) failed: null klass?
#
# JRE version: Java(TM) SE Runtime Environment (22.0) (fastdebug build 22-internal-2023-10-03-0709344.tobias.hartmann.jdk2)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug 22-internal-2023-10-03-0709344.tobias.hartmann.jdk2, mixed mode, sharing, compressed oops, compressed class ptrs, g1 gc, linux-amd64)
# Problematic frame:
# V  [libjvm.so+0x140c554]  DecodeNKlassNode::Value(PhaseGVN*) const+0x1b4

Current CompileTask:
C2:   5582  123             compiler.intrinsics.unsafe.AllocateUninitializedArray::testOK (110 bytes)

Stack: [0x00007fbb8b172000,0x00007fbb8b272000],  sp=0x00007fbb8b26cce0,  free space=1003k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  [libjvm.so+0x140c554]  DecodeNKlassNode::Value(PhaseGVN*) const+0x1b4  (narrowptrnode.cpp:84)
V  [libjvm.so+0x12a659a]  PhaseIdealLoop::split_thru_phi(Node*, Node*, int)+0x30a  (loopopts.cpp:103)
V  [libjvm.so+0x12aa620]  PhaseIdealLoop::split_if_with_blocks_pre(Node*)+0x270  (loopopts.cpp:1165)
V  [libjvm.so+0x12af47f]  PhaseIdealLoop::split_if_with_blocks(VectorSet&, Node_Stack&)+0x15f  (loopopts.cpp:1877)
V  [libjvm.so+0x12a291f]  PhaseIdealLoop::build_and_optimize()+0xf9f  (loopnode.cpp:4572)
V  [libjvm.so+0x9ee7fb]  PhaseIdealLoop::optimize(PhaseIterGVN&, LoopOptsMode)+0x3ab  (loopnode.hpp:1114)
V  [libjvm.so+0x9e9d51]  Compile::Optimize()+0xd91  (compile.cpp:2171)

I'm still seeing the following failures:


# A fatal error has been detected by the Java Runtime Environment:
#
#  Internal Error (/workspace/open/src/hotspot/share/opto/escape.cpp:1299), pid=1574160, tid=1574500
#  assert(false) failed: SafePointScalarMerge nodes can't be nested.
#
# JRE version: Java(TM) SE Runtime Environment (22.0) (fastdebug build 22-internal-2023-10-12-0503164.tobias.hartmann.jdk2)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug 22-internal-2023-10-12-0503164.tobias.hartmann.jdk2, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-aarch64)
# Problematic frame:
# V  [libjvm.so+0xab151c]  ConnectionGraph::verify_ram_nodes(Compile*, Node*)+0x6e8

Current CompileTask:
C2:39141 8262   !   4       akka.actor.ActorCell::invokeAll$1 (577 bytes)

Stack: [0x0000fffea024c000,0x0000fffea044a000],  sp=0x0000fffea0444d50,  free space=2019k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  [libjvm.so+0xab151c]  ConnectionGraph::verify_ram_nodes(Compile*, Node*)+0x6e8  (escape.cpp:1299)
V  [libjvm.so+0x90d1e4]  Compile::Optimize()+0x744  (compile.cpp:2336)
V  [libjvm.so+0x90f098]  Compile::Compile(ciEnv*, ciMethod*, int, Options, DirectiveSet*)+0x1504  (compile.cpp:854)
V  [libjvm.so+0x75b12c]  C2Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*)+0x10c  (c2compiler.cpp:130)
V  [libjvm.so+0x91b124]  CompileBroker::invoke_compiler_on_method(CompileTask*)+0x8e4  (compileBroker.cpp:2282)
V  [libjvm.so+0x91bc3c]  CompileBroker::compiler_thread_loop()+0x5bc  (compileBroker.cpp:1943)
V  [libjvm.so+0xdb4bc0]  JavaThread::thread_main_inner()+0xec  (javaThread.cpp:720)
V  [libjvm.so+0x1600764]  Thread::call_run()+0xb0  (thread.cpp:220)
V  [libjvm.so+0x1368ff8]  thread_native_entry(Thread*)+0x138  (os_linux.cpp:785)
C  [libc.so.6+0x82a28]  start_thread+0x2d4


# A fatal error has been detected by the Java Runtime Environment:
#
#  Internal Error (/workspace/open/src/hotspot/share/opto/narrowptrnode.cpp:84), pid=3481386, tid=3481478
#  assert(t != TypeNarrowKlass::NULL_PTR) failed: null klass?
#
# JRE version: Java(TM) SE Runtime Environment (22.0) (fastdebug build 22-internal-2023-10-12-0503164.tobias.hartmann.jdk2)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug 22-internal-2023-10-12-0503164.tobias.hartmann.jdk2, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64)
# Problematic frame:
# V  [libjvm.so+0x140fcf4]  DecodeNKlassNode::Value(PhaseGVN*) const+0x1b4

Current CompileTask:
C2:44601 8049       4       akka.dispatch.NodeMessageQueue::cleanUp (32 bytes)

Stack: [0x00007f90834f6000,0x00007f90835f6000],  sp=0x00007f90835f0d00,  free space=1003k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  [libjvm.so+0x140fcf4]  DecodeNKlassNode::Value(PhaseGVN*) const+0x1b4  (narrowptrnode.cpp:84)
V  [libjvm.so+0x12aa37a]  PhaseIdealLoop::split_thru_phi(Node*, Node*, int)+0x30a  (loopopts.cpp:103)
V  [libjvm.so+0x12ae400]  PhaseIdealLoop::split_if_with_blocks_pre(Node*)+0x270  (loopopts.cpp:1165)
V  [libjvm.so+0x12b325f]  PhaseIdealLoop::split_if_with_blocks(VectorSet&, Node_Stack&)+0x15f  (loopopts.cpp:1877)
V  [libjvm.so+0x12a66ff]  PhaseIdealLoop::build_and_optimize()+0xf9f  (loopnode.cpp:4572)
V  [libjvm.so+0x9f940b]  PhaseIdealLoop::optimize(PhaseIterGVN&, LoopOptsMode)+0x3ab  (loopnode.hpp:1112)
V  [libjvm.so+0x9f4991]  Compile::Optimize()+0xd91  (compile.cpp:2171)
V  [libjvm.so+0x9f81e0]  Compile::Compile(ciEnv*, ciMethod*, int, Options, DirectiveSet*)+0x1b90  (compile.cpp:854)
V  [libjvm.so+0x848bc9]  C2Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*)+0x159  (c2compiler.cpp:130)
V  [libjvm.so+0xa040d0]  CompileBroker::invoke_compiler_on_method(CompileTask*)+0x980  (compileBroker.cpp:2282)
V  [libjvm.so+0xa04e58]  CompileBroker::compiler_thread_loop()+0x508  (compileBroker.cpp:1943)
V  [libjvm.so+0xebf52c]  JavaThread::thread_main_inner()+0xcc  (javaThread.cpp:720)
V  [libjvm.so+0x1793bea]  Thread::call_run()+0xba  (thread.cpp:220)
V  [libjvm.so+0x14a20da]  thread_native_entry(Thread*)+0x12a  (os_linux.cpp:785)


Unfortunately, they happen with an internal stress test based on the Renaissance Benchmark that I can't share.

-------------

Changes requested by thartmann (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/15825#pullrequestreview-1654667506
PR Comment: https://git.openjdk.org/jdk/pull/15825#issuecomment-1764086364


More information about the hotspot-dev mailing list