RFR: 8295867: TestVerifyGraphEdges.java fails with exit code -1073741571 when using AlwaysIncrementalInline
Christian Hagedorn
chagedorn at openjdk.org
Thu Nov 10 07:23:32 UTC 2022
On Wed, 9 Nov 2022 18:14:43 GMT, Vladimir Kozlov <kvn at openjdk.org> wrote:
> When we use -Xcomp we compile `java.lang.invoke.LambdaForm$Kind::<clinit>` very long linear method for enum class: [LambdaForm.java#L250](https://github.com/openjdk/jdk/blame/master/src/java.base/share/classes/java/lang/invoke/LambdaForm.java#L250)
>
> In addition we inline all <init> class initializers for EA when run with -Xcomp: [bytecodeInfo.cpp#L410](https://github.com/openjdk/jdk/blame/master/src/hotspot/share/opto/bytecodeInfo.cpp#L410)
>
> Recent CDS change [JDK-8293979](https://bugs.openjdk.org/browse/JDK-8293979) allows to inline a bit more deeply too.
>
> Adding `-XX:+AlwaysIncrementalInline` worsened situation even more.
>
> Running with `-XX:+LogCompilation` shows that we hit `NodeCountInliningCutoff (18000)` during `java.lang.invoke.LambdaForm$Kind::<clinit>` compilation.
>
> In short, we have very long (>40000 live nodes) linear IR graph. `Node::verify_edges()` method process nodes depth-first starting from first input which is control edge. So it is not surprise that depth of this method recursion reached 6000.
> With frame size of 10 words (320 bytes) we easy hit stack overflow (768K in 32-bits debug VM).
>
> I fixed it by using local buffer `Node_List` instead of recursion in `Node::verify_edges()`.
> The algorithm was changed to simplify code. It processes inputs in reverse order - last input processed first. And I noticed that maximum use of buffer is only about 1000 or less elements for this compilation (that is why I use live_nodes/16 as initial size of buffer).
>
> Then I did additional experiment with keeping recursion but processing inputs in reverse order:
>
>
> // Recursive walk over all input edges
> - for( i = 0; i < len(); i++ ) {
> - n = in(i);
> + for( i = len(); i > 0; i++ ) {
> + n = in(i - 1);
> if( n != NULL )
> in(i)->verify_edges(visited);
> }
>
>
> And it shows the same around 1000 stack depth!
>
> I decided to keep my original fix because it should be faster (put only one value on list instead of putting all locals, PC, SP on stack and calls) and much less stack usage.
>
> Testing tier1-3, hs-comp-stress and `TestVerifyGraphEdges.java` test runs with `-XX:+AlwaysIncrementalInline`.
I agree that a non-recursive solution is preferable in this case. I only have some minor code style comments - otherwise, the fix looks good!
src/hotspot/share/opto/compile.cpp line 4243:
> 4241: // Allocate stack of size C->live_nodes()/16 to avoid frequent realloc
> 4242: uint stack_size = live_nodes() >> 4;
> 4243: Node_List nstack(MAX2(stack_size, (uint)OptoNodeListSize));
As you only need the stack in `verify_edges()`, I suggest to move these lines directly into the method `verify_edges()`.
src/hotspot/share/opto/node.cpp line 2699:
> 2697: uint length = next->len();
> 2698: for (uint i = 0; i < length; i++) {
> 2699: Node* n = next->in(i);
I suggest to rename `n` to `input` to make it easier to see if it is the current node or an input to it.
src/hotspot/share/opto/node.cpp line 2714:
> 2712: // Check for duplicate edges
> 2713: // walk the input array downcounting the input edges to n
> 2714: for(uint j = 0; j < length; j++) {
Suggestion:
for (uint j = 0; j < length; j++) {
src/hotspot/share/opto/node.hpp line 1220:
> 1218: virtual void dump_compact_spec(outputStream *st) const { dump_spec(st); }
> 1219:
> 1220: static void verify_edges(Node* root, Unique_Node_List &visited, Node_List &nstack); // Verify bi-directional edges
Maybe you can directly rename the method according to your comment: `verify_bidirectional_edges()`.
-------------
Marked as reviewed by chagedorn (Reviewer).
PR: https://git.openjdk.org/jdk/pull/11065
More information about the hotspot-compiler-dev
mailing list