Need help on debugging JVM crash

Yumin Qi yumin.qi at oracle.com
Tue Mar 3 16:22:44 UTC 2020


HI, Sundara

On 3/3/20 8:02 AM, Sundara Mohan M wrote:
> Hi,
>      I am seeing JVM crashes on our system in GC Thread with parallel gc on
> x86 linux. Observed the same crash happening on JVM-11.0.6/13.0.2/13.0.1 GA
> builds.
> Adding some logs lines to give some context.
>
> #
> # A fatal error has been detected by the Java Runtime Environment:
> #
> #  SIGSEGV (0xb) at pc=0x00007f669c964311, pid=66684, tid=71106
> #
> # JRE version: OpenJDK Runtime Environment (13.0.1+9) (build 13.0.1+9)
> # Java VM: OpenJDK 64-Bit Server VM (13.0.1+9, mixed mode, tiered, parallel
> gc, linux-amd64)
> # Problematic frame:
> # V  [libjvm.so+0xcd3311]  PCMarkAndPushClosure::do_oop(oopDesc**)+0x51
> #
> # No core dump will be written. Core dumps have been disabled. To enable
> core dumping, try "ulimit -c unlimited" before starting Java again
> #
> # If you would like to submit a bug report, please visit:
> #   https://github.com/AdoptOpenJDK/openjdk-build/issues
> #
>
> Host: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz, 48 cores, 125G, Red Hat
> Enterprise Linux Server release 6.10 (Santiago)
> Time: Thu Feb  6 11:43:48 2020 UTC elapsed time: 198626 seconds (2d 7h 10m
> 26s)
>
>
> Following is the stack trace
> ex1:
> Stack: [0x00007fd01cbdb000,0x00007fd01ccdb000],  sp=0x00007fd01ccd8890,
>   free space=1014k
> Native frames: (J=compiled Java code, A=aot compiled Java code,
> j=interpreted, Vv=VM code, C=native code)
> *V  [libjvm.so+0xcc0121]  PCMarkAndPushClosure::do_oop(oopDesc**)+0x51*
> V  [libjvm.so+0xc58c8b]  OopMapSet::oops_do(frame const*, RegisterMap
> const*, OopClosure*)+0x2eb
> V  [libjvm.so+0x7521e9]  frame::oops_do_internal(OopClosure*,
> CodeBlobClosure*, RegisterMap*, bool)+0x99
> V  [libjvm.so+0xf55757]  JavaThread::oops_do(OopClosure*,
> CodeBlobClosure*)+0x187
> V  [libjvm.so+0xcbb100]  ThreadRootsMarkingTask::do_it(GCTaskManager*,
> unsigned int)+0xb0
> V  [libjvm.so+0x7e0f8b]  GCTaskThread::run()+0x1eb
> V  [libjvm.so+0xf5d43d]  Thread::call_run()+0x10d
> V  [libjvm.so+0xc74337]  thread_native_entry(Thread*)+0xe7
>
> JavaThread 0x00007fbeb9209800 (nid = 82380) was being processed
> Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
> v  ~RuntimeStub::_new_array_Java
> J 62465 c2
> ch.qos.logback.classic.spi.ThrowableProxy.<init>(Ljava/lang/Throwable;)V
> (207 bytes) @ 0x00007fd00ad43704 [0x00007fd00ad41420+0x00000000000022e4]
> J 474206 c2
> org.eclipse.jetty.util.log.JettyAwareLogger.log(Lorg/slf4j/Marker;ILjava/lang/String;[Ljava/lang/Object;Ljava/lang/Throwable;)V
> (134 bytes) @ 0x00007fd00c4e81ec [0x00007fd00c4e7ee0+0x000000000000030c]
> j
>   org.eclipse.jetty.util.log.JettyAwareLogger.warn(Ljava/lang/String;Ljava/lang/Throwable;)V+7
> j
>   org.eclipse.jetty.util.log.Slf4jLog.warn(Ljava/lang/String;Ljava/lang/Throwable;)V+6
> j
>   org.eclipse.jetty.server.HttpChannel.handleException(Ljava/lang/Throwable;)V+181
> j
>   org.eclipse.jetty.server.HttpChannelOverHttp.handleException(Ljava/lang/Throwable;)V+13
> J 64106 c2 org.eclipse.jetty.server.HttpChannel.handle()Z (997 bytes) @
> 0x00007fd00c6d2cd4 [0x00007fd00c6cdec0+0x0000000000004e14]
> J 280430 c2 org.eclipse.jetty.server.HttpConnection.onFillable()V (334
> bytes) @ 0x00007fd00da925f0 [0x00007fd00da91e40+0x00000000000007b0]
> J 41979 c2 org.eclipse.jetty.io.ChannelEndPoint$2.run()V (12 bytes) @
> 0x00007fd00a14f604 [0x00007fd00a14f4e0+0x0000000000000124]
> J 86362 c2 org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run()V
> (565 bytes) @ 0x00007fd0087d7e34 [0x00007fd0087d7cc0+0x0000000000000174]
> J 75998 c2 java.lang.Thread.run()V java.base at 13.0.2 (17 bytes) @
> 0x00007fd00c93b8d8 [0x00007fd00c93b8a0+0x0000000000000038]
> v  ~StubRoutines::call_stub
>
> ex2:
> Stack: [0x00007f669869f000,0x00007f669879f000],  sp=0x00007f669879c890,
>   free space=1014k
> Native frames: (J=compiled Java code, A=aot compiled Java code,
> j=interpreted, Vv=VM code, C=native code)
>
> *V  [libjvm.so+0xcd3311]  PCMarkAndPushClosure::do_oop(oopDesc**)+0x51*V
>   [libjvm.so+0xc6bf0b]  OopMapSet::oops_do(frame const*, RegisterMap const*,
> OopClosure*)+0x2eb
> V  [libjvm.so+0x765489]  frame::oops_do_internal(OopClosure*,
> CodeBlobClosure*, RegisterMap*, bool)+0x99
> V  [libjvm.so+0xf68b17]  JavaThread::oops_do(OopClosure*,
> CodeBlobClosure*)+0x187
> V  [libjvm.so+0xcce2f0]  ThreadRootsMarkingTask::do_it(GCTaskManager*,
> unsigned int)+0xb0
> V  [libjvm.so+0x7f422b]  GCTaskThread::run()+0x1eb
> V  [libjvm.so+0xf707fd]  Thread::call_run()+0x10d
> V  [libjvm.so+0xc875b7]  thread_native_entry(Thread*)+0xe7
>
> JavaThread 0x00007f5518004000 (nid = 75659) was being processed
> Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
> v  ~RuntimeStub::_new_array_Java
> J 54174 c2
> ch.qos.logback.classic.spi.ThrowableProxy.<init>(Ljava/lang/Throwable;)V
> (207 bytes) @ 0x00007f6687d92678 [0x00007f6687d8c700+0x0000000000005f78]
> J 334031 c2
> com.xmas.webservice.exception.ExceptionLoggingWrapper.execute()V (1004
> bytes) @ 0x00007f6686ede430 [0x00007f6686edd580+0x0000000000000eb0]
> J 53431 c2
> com.xmas.webservice.exception.mapper.AbstractExceptionMapper.toResponse(Lcom/xmas/beans/exceptions/mapper/V3ErrorCode;Ljava/lang/Exception;)Ljavax/ws/rs/core/Response;
> (105 bytes) @ 0x00007f6687db88b0 [0x00007f6687db8660+0x0000000000000250]
> J 63819 c2
> com.xmas.webservice.exception.mapper.RequestBlockedExceptionMapper.toResponse(Ljava/lang/Exception;)Ljavax/ws/rs/core/Response;
> (9 bytes) @ 0x00007f6686a6ed9c [0x00007f6686a6ecc0+0x00000000000000dc]
> J 334032 c2
> com.xmas.webservice.filters.ResponseSerializationWorker.processException()Ljava/io/InputStream;
> (332 bytes) @ 0x00007f668992ad34 [0x00007f668992a840+0x00000000000004f4]
> J 403918 c2
> com.xmas.webservice.filters.ResponseSerializationWorker.execute()Z (272
> bytes) @ 0x00007f66869d67fc [0x00007f66869d5e80+0x000000000000097c]
> J 17530 c2 com.lafaspot.common.concurrent.internal.WorkerWrapper.execute()Z
> (208 bytes) @ 0x00007f66848b3708 [0x00007f66848b36a0+0x0000000000000068]
> J 31970% c2
> com.lafaspot.common.concurrent.internal.WorkerManagerOneThread.call()Lcom/lafaspot/common/concurrent/internal/WorkerManagerState;
> (486 bytes) @ 0x00007f668608dcb0 [0x00007f668608d5e0+0x00000000000006d0]
> j
>   com.lafaspot.common.concurrent.internal.WorkerManagerOneThread.call()Ljava/lang/Object;+1
> J 4889 c1 java.util.concurrent.FutureTask.run()V java.base at 13.0.1 (123
> bytes) @ 0x00007f667d0be604 [0x00007f667d0bdf80+0x0000000000000684]
> J 7487 c1
> java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V
> java.base at 13.0.1 (187 bytes) @ 0x00007f667dd45854
> [0x00007f667dd44a60+0x0000000000000df4]
> J 7486 c1 java.util.concurrent.ThreadPoolExecutor$Worker.run()V
> java.base at 13.0.1 (9 bytes) @ 0x00007f667d1f643c
> [0x00007f667d1f63c0+0x000000000000007c]
> J 7078 c1 java.lang.Thread.run()V java.base at 13.0.1 (17 bytes) @
> 0x00007f667d1f2d74 [0x00007f667d1f2c40+0x0000000000000134]
> v  ~StubRoutines::call_stub
>
> Not very frequent but ~90 days ~120 crashes with following signal
> siginfo: si_signo: 11 (SIGSEGV), si_code: 128 (SI_KERNEL), si_addr:
> 0x0000000000000000
> This signal is generated when we try to access non canonical address in
> linux.
>
> As suggested by Stefan in another thread i tried to
> add VerifyAfterGC/VerifyBeforeGC but it seems to increase the latency and
> applications not surviving our production traffic(timing out and requests
> are failing).
>
> Questions
> 1. When i looked at source code for printing stack trace i see following
> https://github.com/openjdk/jdk11u/blob/master/src/hotspot/share/utilities/vmError.cpp#L696
> (Prints native stack trace)
> https://github.com/openjdk/jdk11u/blob/master/src/hotspot/share/utilities/vmError.cpp#L718
> (printing Java thread stack trace if it is involved in GC crash)
>    a. How do you know this java thread was involved in jvm crash?
When GC processes thread stack as root, the java thread first was 
recorded. This is why at crash, the java thread was printed out.
>    b. Can i assume the java thread printed after native stack trace was the
> culprit?

Please check this thread stack frames, when GC is doing marking work, I 
think, it encountered a bad oop. Check:

If it is a compiled frame, if so, it may related to compiled code.

>    c. Since i am seeing the same frame (~RuntimeStub::_new_array_Java, J
> 54174 c2 ch.qos.logback.classic.spi.ThrowableProxy.<init>..) but different
> stack trace in both crashes can this be the root cause?

It is a C2 compiled frame. The bad oop could be a result of compiler.

It also needs detail debug information to make the conclusion.


Thanks

Yumin

> 2. Thinking of excluding compilation
> of ch.qos.logback.classic.spi.ThrowableProxy class and running in
> production to see if compilation of this method is the cause. Does it make
> sense?
>
> 3. Any other suggestion on debugging this further?
>
> TIA
> Sundar



More information about the hotspot-gc-dev mailing list