[Bug 1882] New: Dangerous interplay between GC, JIT and JVM resulting in JVM crash

Wed Jul 23 22:51:38 UTC 2014

http://icedtea.classpath.org/bugzilla/show_bug.cgi?id=1882

            Bug ID: 1882
           Summary: Dangerous interplay between GC, JIT and JVM resulting
                    in JVM crash
           Product: IcedTea
           Version: 7-hg
          Hardware: arm
                OS: Linux
            Status: NEW
          Severity: blocker
          Priority: P5
         Component: Zero
          Assignee: gnu.andrew at redhat.com
          Reporter: william at autoletics.com
                CC: unassigned at icedtea.classpath.org

#  Internal Error (os_linux_zero.cpp:285), pid=21482, tid=1251472496
#  fatal error: caught unhandled signal 4
#
# JRE version: 7.0_25-b30
# Java VM: OpenJDK Zero VM (23.7-b01 mixed mode linux-arm )

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  Internal Error (os_linux_zero.cpp:270), pid=23439, tid=1212884080
#  fatal error: caught unhandled signal 4
#
# JRE version: 6.0_24-b24
# Java VM: OpenJDK Zero VM (20.0-b12 mixed mode linux-arm )

I would greatly appreciate any suggestions and possible follow-ups regarding an
reproducible issue with the above releases running in an embedded context.
There seems to be a very dangerous interplay going on between the GC, JVM and
the JIT(?). If an application continues to perform, repeatedly, any sort of
allocation it will eventually result in the JVM crashing sometimes with a
signal 4 or signal 11. This is not a memory leak because I can crash the JVM
with even the simplest of code paths such as:

getClass().getResourceAsStream(FILE_NAME);

Strangely enough if all that is created repeatedly is an Object[] array of a
fixed size then it takes far longer for this to occur, sometimes long enough
that I can't be sure it will (ever) happen and so I terminate the process. 

It would appear that I need just the smallest of object, code and stack motion
to occur for this problem to rear itself, such as the line above.

The only sensible workaround I've discovered is to turn off the JIT by passing
-Xint on the command line. The problem seems to completely disappear with this
JVM option. This brings up an even stranger observation and that is if I run
multiple JVM's on the same machine with all except one spinning on some
non-allocating operation such as System.nanoTime() the one JVM that is
performing repeated object allocation will cause the other JVM's to crash as it
crashes. The spinning JVMs will crash with signal 4 and the allocating one will
crash with signal 11. This is not only reproducible but it all happens in an
instant of the first crashing. Now if I run the cpu spinning JVMs with the
-Xint flag but leave the allocating JVM without this flag then *ONLY* the
allocating JVM will crash. I can even run other allocating JVM's alongside each
other as long as I add the -Xint flag. Any JVM that does not have this flag
will crash and cause other JVM's without this flag to crash.

The non-sensible workaround was to call System.gc() after each loop of the test
case but even there if I skipped an iteration the JVM would crash. Of course
this is not at all feasible to do with any sort of application beyond
HelloWorld but it does hopefully hint at sometime.

Please note that I am not running the original application code that resulted
in this issuing being reported but the simplest of test case code I created to
confirm an issue.

I could speculate that the reason -Xint resolves the matter is that it slows
down the JVM sufficiently to allow GC to catch up but then that does not
explain how a JVM can cause other JVMs to crash, near simultaneously, that are
*NOT* making allocation calls and do not have -Xint on the command line. All
JVM's seem to spin along happily as long as there is no allocation on the call
stack. Once there some degree of allocation and minor gc collections the system
trips up. It is like there is some uncontrolled movement (race condition) in
the heap/stack memory that is not being coordinated (or controlled) with the
JIT and/or that it triggers a fatal call path between hotspot and zero and this
is in some shared memory space of sorts hence the spreading to other JVMs.

During some of test runs a NullPointerException was thrown and reported just
before the JVM crashed. But it happened in different places and some of these
were within the JDK classes at locations that would be impossible for a null
value to be presented after being guarded further up in the call stack with a
null check. I also ran tests with an bytecode instrumentation agent and also
got the occasional NullPointerException reported within the agent code, again
at locations it was impossible for the reference to be a null value.

Here are some samples of the top of the stacks reported in various error logs.

----------------------------------------------------------------------------
Stack: [0x48333000,0x484b3000],  sp=0x483f4884,  free space=774k
Java frames:
 0x484b1a6c: stack_word[1]         = 0x433cdeb8
 0x484b1a70: stack_word[0]         = 0x433ce8b0
 0x484b1a74: istate->_thread       = 0x000a4bf8
 0x484b1a78: istate->_bcp          = 0x43a17ee6 (bci 22)
 0x484b1a7c: istate->_locals       = 0x484b1ac8
 0x484b1a80: istate->_constants    = 0x43a90768
 0x484b1a84: istate->_method       =
java.lang.Class.getResourceAsStream(Ljava/lang/String;)Ljava/io/InputStream;
 0x484b1a88: istate->_mdx          = 0x43a7f348
 0x484b1a8c: istate->_stack        = 0x484b1a68
 0x484b1a90: istate->_msg          = 0x00000000
 0x484b1a94: istate->_result       = 0x484b1a58
 0x484b1a98: (istate->_result)     = 0x00000000
 0x484b1a9c: (istate->_result)     = 0x484b1a60
 0x484b1aa0: istate->_prev_link    = 0x484b1a60
 0x484b1aa4: istate->_oop_temp     = 0x00000000
 0x484b1aa8: istate->_stack_base   = 0x484b1a74
 0x484b1aac: istate->_stack_limit  = 0x484b1a68
 0x484b1ab0: istate->_monitor_base = 0x484b1a74
 0x484b1ab4: istate->_self_link    = 0x484b1a74
 0x484b1ab8: frame_type            = INTERPRETER_FRAME
 0x484b1abc: next_frame            = 0x484b1b14

----------------------------------------------------------------------------

Stack: [0x47cbb000,0x47d3a000],  sp=0x47d38254,  free space=500k
JavaThread 0x001f05f0 (nid = 4159) was being processed
Java frames:
 0x48b605d0: stack_word[4]         = 0x00bcc1f2
 0x48b605d4: stack_word[3]         = 0x00000009
 0x48b605d8: stack_word[2]         = 0x00000066
 0x48b605dc: stack_word[1]         = 0x435eb950
 0x48b605e0: stack_word[0]         = 0x4357f8d0
 0x48b605e4: istate->_thread       = 0x001f05f0
 0x48b605e8: istate->_bcp          = 0x43d1e4f7 (bci 383)
 0x48b605ec: istate->_locals       = 0x48b60644
 0x48b605f0: istate->_constants    = 0x43d21008
 0x48b605f4: istate->_method       =
com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.scanName()Ljava/lang/String;
 0x48b605f8: istate->_mdx          = 0x48b60648
 0x48b605fc: istate->_stack        = 0x48b605dc
 0x48b60600: istate->_msg          = 0x00000000
 0x48b60604: istate->_result       = 0x43d31569
 0x48b60608: (istate->_result)     = 0x48b60650
 0x48b6060c: (istate->_result)     = 0x43d32778
 0x48b60610: istate->_prev_link    = 0x43d31590
 0x48b60614: istate->_oop_temp     = 0x00000000
 0x48b60618: istate->_stack_base   = 0x48b605e4
 0x48b6061c: istate->_stack_limit  = 0x48b605cc
 0x48b60620: istate->_monitor_base = 0x48b605e4
 0x48b60624: istate->_self_link    = 0x48b605e4
 0x48b60628: frame_type            = INTERPRETER_FRAME
 0x48b6062c: next_frame            = 0x48b60690

----------------------------------------------------------------------------

Stack: [0x48be9000,0x48d69000],  sp=0x48caa804,  free space=774k
Java frames:
 0x48d67814: stack_word[3]         = 0x018bd428
 0x48d67818: stack_word[2]         = 0x00000000
 0x48d6781c: stack_word[1]         = 0x000fad80
 0x48d67820: stack_word[0]         = 0x435c5c50
 0x48d67824: monitor[0]->_lock     = 0x00000003
 0x48d67828: monitor[0]->_obj      = 0x439434c0
 0x48d6782c: istate->_thread       = 0x0013a798
 0x48d67830: istate->_bcp          = 0x443f2c5c (bci 68)
 0x48d67834: istate->_locals       = 0x48d67890
 0x48d67838: istate->_constants    = 0x443f5078
 0x48d6783c: istate->_method       =
java.util.zip.ZipFile.getEntry(Ljava/lang/String;)Ljava/util/zip/ZipEntry;
 0x48d67840: istate->_mdx          = 0x00000003
 0x48d67844: istate->_stack        = 0x48d6781c
 0x48d67848: istate->_msg          = 0x00000000
 0x48d6784c: istate->_result       = 0x443d9330
 0x48d67850: (istate->_result)     = 0x48d67898
 0x48d67854: (istate->_result)     = 0x443db800
 0x48d67858: istate->_prev_link    = 0x443d9360
 0x48d6785c: istate->_oop_temp     = 0x00000000
 0x48d67860: istate->_stack_base   = 0x48d67824
 0x48d67864: istate->_stack_limit  = 0x48d67810
 0x48d67868: istate->_monitor_base = 0x48d6782c
 0x48d6786c: istate->_self_link    = 0x48d6782c
 0x48d67870: frame_type            = INTERPRETER_FRAME
 0x48d67874: next_frame            = 0x48d678dc

What do you think are the next steps? How is the interplay between the various
components and subsystems of the OpenJDK JVM impacted by the -Xint flag being
added other than "it disables native compilation". Also if when I pass -shark
to the JVM and it reports that it is unsupported does that mean the JIT is not
actually included in the distribution on this device?

-- 
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/distro-pkg-dev/attachments/20140723/9f4318c0/attachment.html>