Intermittent JRuby json issue related to tiered or G1

Aleksey Shipilev shade at redhat.com
Wed Feb 17 14:50:44 UTC 2021


On 2/16/21 9:05 PM, Aleksey Shipilev wrote:
> On 2/16/21 9:00 PM, Charles Oliver Nutter wrote:
>> I am a bit confused about your JDK9 reference. If it was fixed in 9
>> why does it reliably reproduce in 15? Perhaps I am misunderstanding
>> the lineage of the fix you are referring to.
> 
> I am saying that there are no direct JIRA hits that could explain why this is happening. The only
> hit I got is for fix already in JDK 9, so it should not happen again.
> 
> I am (slowly) bisecting between JDK 15 and JDK 16 to see which fix directly or accidentally fixed
> it. Then we would know what we are dealing with.

This thing is really hairy. Reverse bisects shows that this one:
   https://bugs.openjdk.java.net/browse/JDK-8257847

...makes failure in fastdebug much less likely. This explains why I have not seen the failures in 
JDK 16 and JDK 17 yesterday. I have managed to reliably crash the recent JDK by promoting the assert 
in question into guarantee:

diff --git a/src/hotspot/share/opto/ifnode.cpp b/src/hotspot/share/opto/ifnode.cpp
index 29624765324..467d8f19276 100644
--- a/src/hotspot/share/opto/ifnode.cpp
+++ b/src/hotspot/share/opto/ifnode.cpp
@@ -948,7 +948,9 @@ bool IfNode::fold_compares_helper(ProjNode* proj, ProjNode* success, ProjNode* f
      assert((dom_bool->_test.is_less() && proj->_con) ||
             (dom_bool->_test.is_greater() && !proj->_con), "incorrect test");
      // this test was canonicalized
-    assert(this_bool->_test.is_less() && !fail->_con, "incorrect test");
+    guarantee(this_bool->_test.is_less() && !fail->_con, "incorrect test: dom_bool.test=%d 
proj._con=%d this_bool.test=%d fail._con=%d",
+           dom_bool->_test._test, proj->_con,
+           this_bool->_test._test, fail->_con);

      cond = (hi_test == BoolTest::le || hi_test == BoolTest::gt) ? BoolTest::gt : BoolTest::ge;


...which then fails with:

# A fatal error has been detected by the Java Runtime Environment:
#
#  Internal Error (ifnode.cpp:955), pid=2438111, tid=2438182
#  guarantee(this_bool->_test.is_less() && !fail->_con) failed: incorrect test: dom_bool.test=3 
proj._con=1 this_bool.test=7 fail._con=1
#
# JRE version: OpenJDK Runtime Environment (17.0) (build 17-internal+0-adhoc.shade.jdk)
# Java VM: OpenJDK 64-Bit Server VM (17-internal+0-adhoc.shade.jdk, mixed mode, sharing, tiered, 
compressed oops, compressed class ptrs, g1 gc, linux-amd64)
# Problematic frame:
# V  [libjvm.so+0x7fc3ee]  IfNode::fold_compares_helper(ProjNode*, ProjNode*, ProjNode*, 
PhaseIterGVN*) [clone .part.0]+0x19e
#
# Core dump will be written. Default location: Core dumps may be processed with 
"/usr/share/apport/apport %p %s %c %d %P %E" (or dumping to 
/home/shade/temp/jruby/jruby-issue-6554/core.2438111)
#
# An error report file with more information is saved as:
# /home/shade/temp/jruby/jruby-issue-6554/hs_err_pid2438111.log
#
# Compiler replay data is saved as:
# /home/shade/temp/jruby/jruby-issue-6554/replay_pid2438111.log


"this_bool.test=7" means the test is "GE". The downstream code does not expect this. It expects the 
test to be canonicalized. This minimal thing bails out on discovery of such bad test:

  diff --git a/src/hotspot/share/opto/ifnode.cpp b/src/hotspot/share/opto/ifnode.cpp
@@ -971,6 +973,9 @@ bool IfNode::fold_compares_helper(ProjNode* proj, ProjNode* success, ProjNode* f
          lo = igvn->transform(new AddINode(lo, igvn->intcon(1)));
          cond = BoolTest::ge;
        }
+    } else {
+      // Safety: something is broken, break away.
+      return false;
      }
    } else {
      const TypeInt* failtype = filtered_int_type(igvn, n, proj);


I think I'll submit two issues: one that codes fold_compares_helper more defensively like in the 
patch above (this would be backportable), and then the follow-up that targets to address the actual 
problem (why do we have uncanonicalized test).


-- 
Thanks,
-Aleksey



More information about the hotspot-compiler-dev mailing list