Need help on debugging JVM crash

Sundara Mohan M m.sundar85 at gmail.com
Tue Mar 3 17:39:05 UTC 2020


Hi Yumin,
On Tue, Mar 3, 2020 at 11:23 AM Yumin Qi <yumin.qi at oracle.com> wrote:

> HI, Sundara
> On 3/3/20 8:02 AM, Sundara Mohan M wrote:
>
> Hi,
>     I am seeing JVM crashes on our system in GC Thread with parallel gc on
> x86 linux. Observed the same crash happening on JVM-11.0.6/13.0.2/13.0.1 GA
> builds.
> Adding some logs lines to give some context.
>
> #
> # A fatal error has been detected by the Java Runtime Environment:
> #
> #  SIGSEGV (0xb) at pc=0x00007f669c964311, pid=66684, tid=71106
> #
> # JRE version: OpenJDK Runtime Environment (13.0.1+9) (build 13.0.1+9)
> # Java VM: OpenJDK 64-Bit Server VM (13.0.1+9, mixed mode, tiered, parallel
> gc, linux-amd64)
> # Problematic frame:
> # V  [libjvm.so+0xcd3311]  PCMarkAndPushClosure::do_oop(oopDesc**)+0x51
> #
> # No core dump will be written. Core dumps have been disabled. To enable
> core dumping, try "ulimit -c unlimited" before starting Java again
> #
> # If you would like to submit a bug report, please visit:
> #   https://github.com/AdoptOpenJDK/openjdk-build/issues
> #
>
> Host: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz, 48 cores, 125G, Red Hat
> Enterprise Linux Server release 6.10 (Santiago)
> Time: Thu Feb  6 11:43:48 2020 UTC elapsed time: 198626 seconds (2d 7h 10m
> 26s)
>
>
> Following is the stack trace
> ex1:
> Stack: [0x00007fd01cbdb000,0x00007fd01ccdb000],  sp=0x00007fd01ccd8890,
>  free space=1014k
> Native frames: (J=compiled Java code, A=aot compiled Java code,
> j=interpreted, Vv=VM code, C=native code)
> *V  [libjvm.so+0xcc0121]  PCMarkAndPushClosure::do_oop(oopDesc**)+0x51*
> V  [libjvm.so+0xc58c8b]  OopMapSet::oops_do(frame const*, RegisterMap
> const*, OopClosure*)+0x2eb
> V  [libjvm.so+0x7521e9]  frame::oops_do_internal(OopClosure*,
> CodeBlobClosure*, RegisterMap*, bool)+0x99
> V  [libjvm.so+0xf55757]  JavaThread::oops_do(OopClosure*,
> CodeBlobClosure*)+0x187
> V  [libjvm.so+0xcbb100]  ThreadRootsMarkingTask::do_it(GCTaskManager*,
> unsigned int)+0xb0
> V  [libjvm.so+0x7e0f8b]  GCTaskThread::run()+0x1eb
> V  [libjvm.so+0xf5d43d]  Thread::call_run()+0x10d
> V  [libjvm.so+0xc74337]  thread_native_entry(Thread*)+0xe7
>
> JavaThread 0x00007fbeb9209800 (nid = 82380) was being processed
> Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
> v  ~RuntimeStub::_new_array_Java
> J 62465 c2
> ch.qos.logback.classic.spi.ThrowableProxy.<init>(Ljava/lang/Throwable;)V
> (207 bytes) @ 0x00007fd00ad43704 [0x00007fd00ad41420+0x00000000000022e4]
> J 474206 c2
> org.eclipse.jetty.util.log.JettyAwareLogger.log(Lorg/slf4j/Marker;ILjava/lang/String;[Ljava/lang/Object;Ljava/lang/Throwable;)V
> (134 bytes) @ 0x00007fd00c4e81ec [0x00007fd00c4e7ee0+0x000000000000030c]
> j
>  org.eclipse.jetty.util.log.JettyAwareLogger.warn(Ljava/lang/String;Ljava/lang/Throwable;)V+7
> j
>  org.eclipse.jetty.util.log.Slf4jLog.warn(Ljava/lang/String;Ljava/lang/Throwable;)V+6
> j
>  org.eclipse.jetty.server.HttpChannel.handleException(Ljava/lang/Throwable;)V+181
> j
>  org.eclipse.jetty.server.HttpChannelOverHttp.handleException(Ljava/lang/Throwable;)V+13
> J 64106 c2 org.eclipse.jetty.server.HttpChannel.handle()Z (997 bytes) @
> 0x00007fd00c6d2cd4 [0x00007fd00c6cdec0+0x0000000000004e14]
> J 280430 c2 org.eclipse.jetty.server.HttpConnection.onFillable()V (334
> bytes) @ 0x00007fd00da925f0 [0x00007fd00da91e40+0x00000000000007b0]
> J 41979 c2 org.eclipse.jetty.io.ChannelEndPoint$2.run()V (12 bytes) @
> 0x00007fd00a14f604 [0x00007fd00a14f4e0+0x0000000000000124]
> J 86362 c2 org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run()V
> (565 bytes) @ 0x00007fd0087d7e34 [0x00007fd0087d7cc0+0x0000000000000174]
> J 75998 c2 java.lang.Thread.run()V java.base at 13.0.2 (17 bytes) @
> 0x00007fd00c93b8d8 [0x00007fd00c93b8a0+0x0000000000000038]
> v  ~StubRoutines::call_stub
>
> ex2:
> Stack: [0x00007f669869f000,0x00007f669879f000],  sp=0x00007f669879c890,
>  free space=1014k
> Native frames: (J=compiled Java code, A=aot compiled Java code,
> j=interpreted, Vv=VM code, C=native code)
>
> *V  [libjvm.so+0xcd3311]  PCMarkAndPushClosure::do_oop(oopDesc**)+0x51*V
>  [libjvm.so+0xc6bf0b]  OopMapSet::oops_do(frame const*, RegisterMap const*,
> OopClosure*)+0x2eb
> V  [libjvm.so+0x765489]  frame::oops_do_internal(OopClosure*,
> CodeBlobClosure*, RegisterMap*, bool)+0x99
> V  [libjvm.so+0xf68b17]  JavaThread::oops_do(OopClosure*,
> CodeBlobClosure*)+0x187
> V  [libjvm.so+0xcce2f0]  ThreadRootsMarkingTask::do_it(GCTaskManager*,
> unsigned int)+0xb0
> V  [libjvm.so+0x7f422b]  GCTaskThread::run()+0x1eb
> V  [libjvm.so+0xf707fd]  Thread::call_run()+0x10d
> V  [libjvm.so+0xc875b7]  thread_native_entry(Thread*)+0xe7
>
> JavaThread 0x00007f5518004000 (nid = 75659) was being processed
> Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
> v  ~RuntimeStub::_new_array_Java
> J 54174 c2
> ch.qos.logback.classic.spi.ThrowableProxy.<init>(Ljava/lang/Throwable;)V
> (207 bytes) @ 0x00007f6687d92678 [0x00007f6687d8c700+0x0000000000005f78]
> J 334031 c2
> com.xmas.webservice.exception.ExceptionLoggingWrapper.execute()V (1004
> bytes) @ 0x00007f6686ede430 [0x00007f6686edd580+0x0000000000000eb0]
> J 53431 c2
> com.xmas.webservice.exception.mapper.AbstractExceptionMapper.toResponse(Lcom/xmas/beans/exceptions/mapper/V3ErrorCode;Ljava/lang/Exception;)Ljavax/ws/rs/core/Response;
> (105 bytes) @ 0x00007f6687db88b0 [0x00007f6687db8660+0x0000000000000250]
> J 63819 c2
> com.xmas.webservice.exception.mapper.RequestBlockedExceptionMapper.toResponse(Ljava/lang/Exception;)Ljavax/ws/rs/core/Response;
> (9 bytes) @ 0x00007f6686a6ed9c [0x00007f6686a6ecc0+0x00000000000000dc]
> J 334032 c2
> com.xmas.webservice.filters.ResponseSerializationWorker.processException()Ljava/io/InputStream;
> (332 bytes) @ 0x00007f668992ad34 [0x00007f668992a840+0x00000000000004f4]
> J 403918 c2
> com.xmas.webservice.filters.ResponseSerializationWorker.execute()Z (272
> bytes) @ 0x00007f66869d67fc [0x00007f66869d5e80+0x000000000000097c]
> J 17530 c2 com.lafaspot.common.concurrent.internal.WorkerWrapper.execute()Z
> (208 bytes) @ 0x00007f66848b3708 [0x00007f66848b36a0+0x0000000000000068]
> J 31970% c2
> com.lafaspot.common.concurrent.internal.WorkerManagerOneThread.call()Lcom/lafaspot/common/concurrent/internal/WorkerManagerState;
> (486 bytes) @ 0x00007f668608dcb0 [0x00007f668608d5e0+0x00000000000006d0]
> j
>  com.lafaspot.common.concurrent.internal.WorkerManagerOneThread.call()Ljava/lang/Object;+1
> J 4889 c1 java.util.concurrent.FutureTask.run()V java.base at 13.0.1 (123
> bytes) @ 0x00007f667d0be604 [0x00007f667d0bdf80+0x0000000000000684]
> J 7487 c1
> java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)Vjava.base at 13.0.1 (187 bytes) @ 0x00007f667dd45854
> [0x00007f667dd44a60+0x0000000000000df4]
> J 7486 c1 java.util.concurrent.ThreadPoolExecutor$Worker.run()Vjava.base at 13.0.1 (9 bytes) @ 0x00007f667d1f643c
> [0x00007f667d1f63c0+0x000000000000007c]
> J 7078 c1 java.lang.Thread.run()V java.base at 13.0.1 (17 bytes) @
> 0x00007f667d1f2d74 [0x00007f667d1f2c40+0x0000000000000134]
> v  ~StubRoutines::call_stub
>
> Not very frequent but ~90 days ~120 crashes with following signal
> siginfo: si_signo: 11 (SIGSEGV), si_code: 128 (SI_KERNEL), si_addr:
> 0x0000000000000000
> This signal is generated when we try to access non canonical address in
> linux.
>
> As suggested by Stefan in another thread i tried to
> add VerifyAfterGC/VerifyBeforeGC but it seems to increase the latency and
> applications not surviving our production traffic(timing out and requests
> are failing).
>
> Questions
> 1. When i looked at source code for printing stack trace i see followinghttps://github.com/openjdk/jdk11u/blob/master/src/hotspot/share/utilities/vmError.cpp#L696
> (Prints native stack trace)https://github.com/openjdk/jdk11u/blob/master/src/hotspot/share/utilities/vmError.cpp#L718
> (printing Java thread stack trace if it is involved in GC crash)
>   a. How do you know this java thread was involved in jvm crash?
>
> When GC processes thread stack as root, the java thread first was
> recorded. This is why at crash, the java thread was printed out.
>
>   b. Can i assume the java thread printed after native stack trace was the
> culprit?
>
> Please check this thread stack frames, when GC is doing marking work, I
> think, it encountered a bad oop. Check:
>
> If it is a compiled frame, if so, it may related to compiled code.
>
>   c. Since i am seeing the same frame (~RuntimeStub::_new_array_Java, J
> 54174 c2 ch.qos.logback.classic.spi.ThrowableProxy.<init>..) but different
> stack trace in both crashes can this be the root cause?
>
> It is a C2 compiled frame. The bad oop could be a result of compiler.
>
Actually the top two frame are always same in different crashes
v ~RuntimeStub::_new_array_Java
J 54174 c2
ch.qos.logback.classic.spi.ThrowableProxy.<init>(Ljava/lang/Throwable;)V
(207 bytes) @ 0x00007f6687d92678 [0x00007f6687d8c700+0x0000000000005f78]
In this case do you think JVM code(frame 1) or C2 compiler code(frame 2)
might be issue?
Is there any way to identify that and what kind of debug flags/settings
might give us this information?

> It also needs detail debug information to make the conclusion.
>
Do you think any of the information dumped in hs_err* file might give us
more info (like registers content/Instructions/core file)?

Can you please let me know what additional details might help to make the
conclusion? Also how to get those information?

Thanks
>
> Yumin
>
> 2. Thinking of excluding compilation
> of ch.qos.logback.classic.spi.ThrowableProxy class and running in
> production to see if compilation of this method is the cause. Does it make
> sense?
>
> 3. Any other suggestion on debugging this further?
>
> TIA
> Sundar
>
>
Thanks
Sundar



More information about the hotspot-gc-dev mailing list