Crash when using java debugger and kafka

Thu Oct 31 10:16:23 UTC 2019

Hmm...not great news.  I pulled the latest build from
https://builds.shipilev.net/openjdk-shenandoah-jdk8/
(openjdk-shenandoah-jdk8-latest-linux-x86_64-release.tar.xz
38M 2019-Oct-30 15:14).

The bug does not occur when running jdb/mvn (as in the test case)...but it
still occurs when debugging from Intellij IDEA.  It's slightly better, now
only 1 core locks up :).  Disabling class unloading does seem to stop the
error occurring.

This is the -verbose:gc log from one instance when it did freeze (and then
i kill -9'd it):

Consider -XX:+ClassUnloadingWithConcurrentMark if large pause times are
observed on class-unloading sensitive workloads
Heuristics ergonomically sets -XX:+ExplicitGCInvokesConcurrent
Heuristics ergonomically sets -XX:+ShenandoahImplicitGCInvokesConcurrent
Connected to the target VM, address: '127.0.0.1:0', transport: 'socket'

| 2019-10-31 10:13:19,016 INFO  org.facboy.KafkaTestServer: Starting kafka
server.
| 2019-10-31 10:13:19,032 INFO  org.facboy.KafkaTestServer: ZooKeeper
instance is successfully started on port 35216
Trigger: Metadata GC Threshold
[Concurrent reset 104M->104M(784M), 0.297 ms]
[Pause Init Mark (process weakrefs) (unload classes), 2.922 ms]
[Concurrent marking (process weakrefs) (unload classes) 105M->105M(784M),
6.062 ms]
[Concurrent precleaning 105M->105M(784M), 0.679 ms]
Disconnected from the target VM, address: '127.0.0.1:0', transport: 'socket'

Process finished with exit code 137 (interrupted by signal 9: SIGKILL)
[Pause Final Mark (process weakrefs) (unload classes)

On Thu, Oct 31, 2019 at 8:10 AM Aleksey Shipilev <shade at redhat.com> wrote:

> On 10/30/19 10:09 PM, Aleksey Shipilev wrote:
> > On 10/30/19 8:42 PM, Roman Kennke wrote:
> >> Our recent jdk8 development introduced the new barrier model (LRB). I
> >> tried your testcase with a recent build and it does not happen for me.
> >> Can you try a suitable build from here:
> >>
> >> https://builds.shipilev.net/openjdk-shenandoah-jdk8/
> >
> > I actually tried that, and the hang is gone.
> >
> > I can reproduce the hang with 8u232, it indeed livelocks during Final
> Mark:
> >
> > Heuristics ergonomically sets -XX:+ShenandoahImplicitGCInvokesConcurrent
> > Trigger: Metadata GC Threshold
> > 2.361: [Concurrent reset 88128K->88128K(2016M), 2.955 ms]
> > 2.364: [Pause Init Mark (process weakrefs) (unload classes), 6.880 ms]
> > 2.371: [Concurrent marking (process weakrefs) (unload classes)
> 88128K->89152K(2016M), 5.414 ms]
> > 2.376: [Concurrent precleaning 89152K->89152K(2016M), 1.677 ms]
> > 2.378: [Pause Final Mark (process weakrefs) (unload classes)
> >
> > ...when doing JvmtiTagMap::weak_oops_do. I might study how that happens
> in 8u232, to make sure head
> > sh/jdk8 does not work by accident.
>
> Found it. We seem to be entering JvmtiTagMap::do_weak_oops by all GC
> threads when evacuating the
> roots, which breaks in all sorts of weird ways when it tries to resize the
> underlying hash table
> racily.
>
> Here is the fix for 8u232:
>
> diff -r 309b496da750
> src/share/vm/gc_implementation/shenandoah/shenandoahRootProcessor.cpp
> ---
> a/src/share/vm/gc_implementation/shenandoah/shenandoahRootProcessor.cpp
>  Thu Oct 10 18:16:48
> 2019 +0100
> +++
> b/src/share/vm/gc_implementation/shenandoah/shenandoahRootProcessor.cpp
>  Thu Oct 31 09:06:51
> 2019 +0100
> @@ -257,11 +257,11 @@
>    if (blobs != NULL) {
>      ShenandoahWorkerTimingsTracker timer(worker_times,
> ShenandoahPhaseTimings::CodeCacheRoots,
> worker_id);
>      _coderoots_cset_iterator.possibly_parallel_blobs_do(blobs);
>    }
>
> -  if (_evacuation_tasks->is_task_claimed(SHENANDOAH_EVAC_jvmti_oops_do)) {
> +  if (!_evacuation_tasks->is_task_claimed(SHENANDOAH_EVAC_jvmti_oops_do))
> {
>      ShenandoahForwardedIsAliveClosure is_alive;
>      ShenandoahWorkerTimingsTracker timer(worker_times,
> ShenandoahPhaseTimings::JVMTIRoots, worker_id);
>      JvmtiExport::weak_oops_do(&is_alive, oops);
>    }
>  }
>
> It was *accidentally* fixed with LRB backport:
>
>
> https://hg.openjdk.java.net/shenandoah/jdk8/hotspot/file/e9d60bdac4b5/src/share/vm/gc_implementation/shenandoah/shenandoahRootProcessor.cpp#l296
>
> Bottom line:
>  1) 8u232 is broken with lots of JVMTI tags (which is the case for
> debugging);
>  2) 8u232 workaround is to treat all JVMTI tags as strongly reachable
> (-XX:-ClassUnloading)
>  3) Current sh/jdk8 is immune to this, by happy accident, and would
> continue to be immune.
>
> Christopher, does this work for you?
>
> --
> Thanks,
> -Aleksey
>
>