Crash when using java debugger and kafka

Thu Oct 31 08:10:00 UTC 2019

On 10/30/19 10:09 PM, Aleksey Shipilev wrote:
> On 10/30/19 8:42 PM, Roman Kennke wrote:
>> Our recent jdk8 development introduced the new barrier model (LRB). I
>> tried your testcase with a recent build and it does not happen for me.
>> Can you try a suitable build from here:
>>
>> https://builds.shipilev.net/openjdk-shenandoah-jdk8/
> 
> I actually tried that, and the hang is gone.
> 
> I can reproduce the hang with 8u232, it indeed livelocks during Final Mark:
> 
> Heuristics ergonomically sets -XX:+ShenandoahImplicitGCInvokesConcurrent
> Trigger: Metadata GC Threshold
> 2.361: [Concurrent reset 88128K->88128K(2016M), 2.955 ms]
> 2.364: [Pause Init Mark (process weakrefs) (unload classes), 6.880 ms]
> 2.371: [Concurrent marking (process weakrefs) (unload classes) 88128K->89152K(2016M), 5.414 ms]
> 2.376: [Concurrent precleaning 89152K->89152K(2016M), 1.677 ms]
> 2.378: [Pause Final Mark (process weakrefs) (unload classes)
> 
> ...when doing JvmtiTagMap::weak_oops_do. I might study how that happens in 8u232, to make sure head
> sh/jdk8 does not work by accident.

Found it. We seem to be entering JvmtiTagMap::do_weak_oops by all GC threads when evacuating the
roots, which breaks in all sorts of weird ways when it tries to resize the underlying hash table
racily.

Here is the fix for 8u232:

diff -r 309b496da750 src/share/vm/gc_implementation/shenandoah/shenandoahRootProcessor.cpp

--- a/src/share/vm/gc_implementation/shenandoah/shenandoahRootProcessor.cpp     Thu Oct 10 18:16:48
2019 +0100
+++ b/src/share/vm/gc_implementation/shenandoah/shenandoahRootProcessor.cpp     Thu Oct 31 09:06:51
2019 +0100
@@ -257,11 +257,11 @@
   if (blobs != NULL) {
     ShenandoahWorkerTimingsTracker timer(worker_times, ShenandoahPhaseTimings::CodeCacheRoots,
worker_id);
     _coderoots_cset_iterator.possibly_parallel_blobs_do(blobs);
   }

-  if (_evacuation_tasks->is_task_claimed(SHENANDOAH_EVAC_jvmti_oops_do)) {
+  if (!_evacuation_tasks->is_task_claimed(SHENANDOAH_EVAC_jvmti_oops_do)) {
     ShenandoahForwardedIsAliveClosure is_alive;
     ShenandoahWorkerTimingsTracker timer(worker_times, ShenandoahPhaseTimings::JVMTIRoots, worker_id);
     JvmtiExport::weak_oops_do(&is_alive, oops);
   }
 }

It was *accidentally* fixed with LRB backport:

https://hg.openjdk.java.net/shenandoah/jdk8/hotspot/file/e9d60bdac4b5/src/share/vm/gc_implementation/shenandoah/shenandoahRootProcessor.cpp#l296

Bottom line:
 1) 8u232 is broken with lots of JVMTI tags (which is the case for debugging);
 2) 8u232 workaround is to treat all JVMTI tags as strongly reachable (-XX:-ClassUnloading)
 3) Current sh/jdk8 is immune to this, by happy accident, and would continue to be immune.

Christopher, does this work for you?

-- 
Thanks,
-Aleksey