Parallel GC Thread crash

Stefan Karlsson stefan.karlsson at oracle.com
Mon Feb 10 19:42:49 UTC 2020


Hi Sundar,

On 2020-02-10 19:32, Sundara Mohan M wrote:
> Hi Stefan,
>     We started seeing more crashes on JDK13.0.1+9
>
> Since seeing it on GC Task Thread assumed it is related to GC.

As I said in my previous mail, I don't think this is caused by GC code. 
More below.

>
> # Problematic frame:
> # V  [libjvm.so+0xd183c0]  PSRootsClosure<false>::do_oop(oopDesc**)+0x30
>
> Command Line: -XX:+AlwaysPreTouch -Xms64000m -Xmx64000m 
> -XX:NewSize=40000m -XX:+DisableExplicitGC -Xnoclassgc 
> -XX:+UseParallelGC -XX:ParallelGCThreads=40 -XX:ConcGCTh
> reads=5 ...
>
> Host: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz, 48 cores, 125G, Red 
> Hat Enterprise Linux Server release 6.10 (Santiago)
> Time: Fri Feb  7 11:15:04 2020 UTC elapsed time: 286290 seconds (3d 7h 
> 31m 30s)
>
> ---------------  T H R E A D  ---------------
>
> Current thread (0x00007fca6c074000):  GCTaskThread "ParGC Thread#28" 
> [stack: 0x00007fba72ff1000,0x00007fba730f1000] [id=56530]
>
> Stack: [0x00007fba72ff1000,0x00007fba730f1000], 
>  sp=0x00007fba730ee850,  free space=1014k
> Native frames: (J=compiled Java code, A=aot compiled Java code, 
> j=interpreted, Vv=VM code, C=native code)
> V  [libjvm.so+0xd183c0]  PSRootsClosure<false>::do_oop(oopDesc**)+0x30
> V  [libjvm.so+0xc6bf0b]  OopMapSet::oops_do(frame const*, RegisterMap 
> const*, OopClosure*)+0x2eb
> V  [libjvm.so+0x765489]  frame::oops_do_internal(OopClosure*, 
> CodeBlobClosure*, RegisterMap*, bool)+0x99
> V  [libjvm.so+0xf68b17]  JavaThread::oops_do(OopClosure*, 
> CodeBlobClosure*)+0x187
> V  [libjvm.so+0xd190be]  ThreadRootsTask::do_it(GCTaskManager*, 
> unsigned int)+0x6e
> V  [libjvm.so+0x7f422b]  GCTaskThread::run()+0x1eb
> V  [libjvm.so+0xf707fd]  Thread::call_run()+0x10d
> V  [libjvm.so+0xc875b7]  thread_native_entry(Thread*)+0xe7
>
> JavaThread 0x00007fb8f4036800 (nid = 60927) was being processed
> Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
> v  ~RuntimeStub::_new_array_Java
> J 58520 c2 
> ch.qos.logback.classic.spi.ThrowableProxy.<init>(Ljava/lang/Throwable;)V 
> (207 bytes) @ 0x00007fca5fd23dec [0x00007fca5fd1dbc0+0x000000000000622c]
> J 66864 c2 webservice.exception.ExceptionLoggingWrapper.execute()V 
> (1004 bytes) @ 0x00007fca60c02588 [0x00007fca60bffce0+0x00000000000028a8]
> J 58224 c2 
> webservice.exception.mapper.AbstractExceptionMapper.toResponse(Lbeans/exceptions/mapper/V3ErrorCode;Ljava/lang/Exception;)Ljavax/ws/rs/core/Response; 
> (105 bytes) @ 0x00007fca5f59bad8 [0x00007fca5f59b880+0x0000000000000258]
> J 69992 c2 
> webservice.exception.mapper.JediRequestBlockedExceptionMapper.toResponse(Ljava/lang/Exception;)Ljavax/ws/rs/core/Response; 
> (9 bytes) @ 0x00007fca5e1019f4 [0x00007fca5e101940+0x00000000000000b4]
> J 55265 c2 
> webservice.filters.ResponseSerializationWorker.processException()Ljava/io/InputStream; 
> (332 bytes) @ 0x00007fca5f6f58e0 [0x00007fca5f6f5700+0x00000000000001e0]
> J 483122 c2 webservice.filters.ResponseSerializationWorker.execute()Z 
> (272 bytes) @ 0x00007fca622fc2b4 [0x00007fca622fbc80+0x0000000000000634]
> J 15811% c2 
> com.lafaspot.common.concurrent.internal.WorkerManagerOneThread.call()Lcom/lafaspot/common/concurrent/internal/WorkerManagerState; 
> (486 bytes) @ 0x00007fca5c108794 [0x00007fca5c1082a0+0x00000000000004f4]
> j 
>  com.lafaspot.common.concurrent.internal.WorkerManagerOneThread.call()Ljava/lang/Object;+1
> J 4586 c1 java.util.concurrent.FutureTask.run()V java.base at 13.0.1 (123 
> bytes) @ 0x00007fca54d27184 [0x00007fca54d26b00+0x0000000000000684]
> J 7550 c1 
> java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V 
> java.base at 13.0.1 (187 bytes) @ 0x00007fca54fbb6d4 
> [0x00007fca54fba8e0+0x0000000000000df4]
> J 7549 c1 java.util.concurrent.ThreadPoolExecutor$Worker.run()V 
> java.base at 13.0.1 (9 bytes) @ 0x00007fca5454b93c 
> [0x00007fca5454b8c0+0x000000000000007c]
> J 4585 c1 java.lang.Thread.run()V java.base at 13.0.1 (17 bytes) @ 
> 0x00007fca54d250f4 [0x00007fca54d24fc0+0x0000000000000134]
> v  ~StubRoutines::call_stub
>
> siginfo: si_signo: 11 (SIGSEGV), si_code: 128 (SI_KERNEL), si_addr: 
> 0x0000000000000000
>
> Does JDK11 and 13 have different code for GC. Do you think 
> downgrading(JDK11 stable)/upgrading(JDK-13.0.2) might help here?

You should at least move to 13.0.2, to get the latest bug fixes/patches.

There has been a lot of changes in all areas of the JVM between 11 and 
13. We don't yet know the root cause of this crash, and I can't say if 
this is caused by new changes or not. Have you or anyone filed a bug 
report for this?

> Any insight to debug this will be helpful.

Did you try my previous suggestion to run with -XX:+VerifyBeforeGC and 
-XX:+VerifyAfterGC? If you can tolerate the longer GC times it 
introduces, then you could try to run with 
-XX:+UnlockDiagnosticVMOptions -XX:+VerifyBeforeGC -XX:+VerifyAfterGC .

Cheers,
StefanK

>
> TIA
> Sundar
>
> On Tue, Feb 4, 2020 at 5:47 AM Stefan Karlsson 
> <stefan.karlsson at oracle.com <mailto:stefan.karlsson at oracle.com>> wrote:
>
>     Hi Sundar,
>
>     The GC crashes when it encounters something bad on the stack:
>      > V  [libjvm.so+0xc6bf0b]  OopMapSet::oops_do(frame const*,
>     RegisterMap
>      > const*, OopClosure*)+0x2eb
>      > V  [libjvm.so+0x765489] frame::oops_do_internal(OopClosure*,
>
>     This is probably not a GC bug. It's more likely that this is
>     caused by
>     the JIT compiler. I see in your hotspot-runtime-dev thread, that you
>     also get crashes in other compiler related areas.
>
>     If you want to rule out the GC, you can run with
>     -XX:+VerifyBeforeGC and
>     -XX:+VerifyAfterGC, and see if this asserts before the GC has started
>     running.
>
>     StefanK
>
>     On 2020-02-04 04:38, Sundara Mohan M wrote:
>     > Hi,
>     >     I am seeing following crashes frequently on our servers
>     > #
>     > # A fatal error has been detected by the Java Runtime Environment:
>     > #
>     > #  SIGSEGV (0xb) at pc=0x00007fca3281d311, pid=103575, tid=108299
>     > #
>     > # JRE version: OpenJDK Runtime Environment (13.0.1+9) (build
>     13.0.1+9)
>     > # Java VM: OpenJDK 64-Bit Server VM (13.0.1+9, mixed mode,
>     tiered, parallel
>     > gc, linux-amd64)
>     > # Problematic frame:
>     > # V  [libjvm.so+0xcd3311]
>     PCMarkAndPushClosure::do_oop(oopDesc**)+0x51
>     > #
>     > # No core dump will be written. Core dumps have been disabled.
>     To enable
>     > core dumping, try "ulimit -c unlimited" before starting Java again
>     > #
>     > # If you would like to submit a bug report, please visit:
>     > # https://github.com/AdoptOpenJDK/openjdk-build/issues
>     > #
>     >
>     >
>     > ---------------  T H R E A D  ---------------
>     >
>     > Current thread (0x00007fca2c051000):  GCTaskThread "ParGC
>     Thread#8" [stack:
>     > 0x00007fca30277000,0x00007fca30377000] [id=108299]
>     >
>     > Stack: [0x00007fca30277000,0x00007fca30377000],
>     sp=0x00007fca30374890,
>     >   free space=1014k
>     > Native frames: (J=compiled Java code, A=aot compiled Java code,
>     > j=interpreted, Vv=VM code, C=native code)
>     > V  [libjvm.so+0xcd3311] PCMarkAndPushClosure::do_oop(oopDesc**)+0x51
>     > V  [libjvm.so+0xc6bf0b]  OopMapSet::oops_do(frame const*,
>     RegisterMap
>     > const*, OopClosure*)+0x2eb
>     > V  [libjvm.so+0x765489] frame::oops_do_internal(OopClosure*,
>     > CodeBlobClosure*, RegisterMap*, bool)+0x99
>     > V  [libjvm.so+0xf68b17]  JavaThread::oops_do(OopClosure*,
>     > CodeBlobClosure*)+0x187
>     > V  [libjvm.so+0xcce2f0]
>     ThreadRootsMarkingTask::do_it(GCTaskManager*,
>     > unsigned int)+0xb0
>     > V  [libjvm.so+0x7f422b]  GCTaskThread::run()+0x1eb
>     > V  [libjvm.so+0xf707fd]  Thread::call_run()+0x10d
>     > V  [libjvm.so+0xc875b7] thread_native_entry(Thread*)+0xe7
>     >
>     > JavaThread 0x00007fb85c004800 (nid = 111387) was being processed
>     > Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
>     > v  ~RuntimeStub::_new_array_Java
>     > J 225122 c2
>     >
>     ch.qos.logback.classic.spi.ThrowableProxy.<init>(Ljava/lang/Throwable;)V
>     > (207 bytes) @ 0x00007fca21f1a5d8
>     [0x00007fca21f17f20+0x00000000000026b8]
>     > J 62342 c2
>     webservice.exception.ExceptionLoggingWrapper.execute()V (1004
>     > bytes) @ 0x00007fca20f0aec8 [0x00007fca20f07f40+0x0000000000002f88]
>     > J 225129 c2
>     >
>     webservice.exception.mapper.AbstractExceptionMapper.toResponse(Lbeans/exceptions/mapper/V3ErrorCode;Ljava/lang/Exception;)Ljavax/ws/rs/core/Response;
>     > (105 bytes) @ 0x00007fca1da512ac
>     [0x00007fca1da51100+0x00000000000001ac]
>     > J 131643 c2
>     >
>     webservice.exception.mapper.RequestBlockedExceptionMapper.toResponse(Ljava/lang/Exception;)Ljavax/ws/rs/core/Response;
>     > (9 bytes) @ 0x00007fca20ce6190
>     [0x00007fca20ce60c0+0x00000000000000d0]
>     > J 55114 c2
>     >
>     webservice.filters.ResponseSerializationWorker.processException()Ljava/io/InputStream;
>     > (332 bytes) @ 0x00007fca2051fe64
>     [0x00007fca2051f820+0x0000000000000644]
>     > J 57859 c2
>     webservice.filters.ResponseSerializationWorker.execute()Z (272
>     > bytes) @ 0x00007fca1ef2ed18 [0x00007fca1ef2e140+0x0000000000000bd8]
>     > J 16114% c2
>     >
>     com.lafaspot.common.concurrent.internal.WorkerManagerOneThread.call()Lcom/lafaspot/common/concurrent/internal/WorkerManagerState;
>     > (486 bytes) @ 0x00007fca1ced465c
>     [0x00007fca1ced4200+0x000000000000045c]
>     > j
>     >
>      com.lafaspot.common.concurrent.internal.WorkerManagerOneThread.call()Ljava/lang/Object;+1
>     > J 11639 c2 java.util.concurrent.FutureTask.run()V
>     java.base at 13.0.1 (123
>     > bytes) @ 0x00007fca1cd00858 [0x00007fca1cd007c0+0x0000000000000098]
>     > J 7560 c1
>     >
>     java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V
>     > java.base at 13.0.1 (187 bytes) @ 0x00007fca15b23f54
>     > [0x00007fca15b23160+0x0000000000000df4]
>     > J 5143 c1 java.util.concurrent.ThreadPoolExecutor$Worker.run()V
>     > java.base at 13.0.1 (9 bytes) @ 0x00007fca15b39abc
>     > [0x00007fca15b39a40+0x000000000000007c]
>     > J 4488 c1 java.lang.Thread.run()V java.base at 13.0.1 (17 bytes) @
>     > 0x00007fca159fc174 [0x00007fca159fc040+0x0000000000000134]
>     > v  ~StubRoutines::call_stub
>     >
>     > siginfo: si_signo: 11 (SIGSEGV), si_code: 128 (SI_KERNEL), si_addr:
>     > 0x0000000000000000
>     >
>     > Register to memory mapping:
>     > ...
>     >
>     > Can someone shed more info on when this can happen? I am seeing
>     this on
>     > multiple servers with Java 13.0.1+9 on RHEL6 servers.
>     >
>     > There was another thread in hotspot runtime where David Holmes
>     pointed this
>     >> siginfo: si_signo: 11 (SIGSEGV), si_code: 128 (SI_KERNEL), si_addr:
>     > 0x0000000000000000
>     >
>     >> This seems it may be related to:
>     >> https://bugs.openjdk.java.net/browse/JDK-8004124
>     >
>     > Just wondering if this is same or something to do with GC specific.
>     >
>     >
>     >
>     > TIA
>     > Sundar
>     >
>




More information about the hotspot-gc-dev mailing list