Unexpected issues with Final Mark pauses and pacer performance in JDK11

Zhengyu Gu zgu at redhat.com
Tue Aug 4 19:02:15 UTC 2020


Hi Bern,

Please find my current patch for adding detail timings for class 
unloading cleanup works.

I did not expect string table patch can make meaning impact, because it 
does not seem that many strings die from early logs.

-Zhengyu


On 8/4/20 12:52 PM, Mathiske, Bernd wrote:
> I should mention that we are using "OpenJDK 11.0.8 with Shenandoah" in this context and that's why only half the patch applied. In OpenJDK 15, both parts apply.
> 
> On 8/4/20, 9:48 AM, "shenandoah-dev on behalf of Mathiske, Bernd" <shenandoah-dev-retn at openjdk.java.net on behalf of mathiske at amazon.com> wrote:
> 
>      That did not help in this case. But thanks for the bug fix anyway!
>      BTW, I could only apply the string table part of the patch, the ResolvedMethod part does not apply.
> 
>      So we've still got that problem with long pauses in final mark. Doubling parallel GC threads brought them from ~20ms to ~13. This only happens in final mark with class unloading. I found that there are three code blocks that could be responsible and maybe I have been looking at the wrong one.
> 
>      void ShenandoahHeap::stw_unload_classes(bool full_gc) {
>        if (!unload_classes()) return;
>        bool purged_class;
> 
>      // BLOCK 1
>        // Unload classes and purge SystemDictionary.
>        {
>          ShenandoahGCPhase phase(full_gc ?
>                                  ShenandoahPhaseTimings::full_gc_purge_class_unload :
>                                  ShenandoahPhaseTimings::purge_class_unload);
>          purged_class = SystemDictionary::do_unloading(gc_timer());
>        }
> 
>      // BLOCK 2
>        {
>          ShenandoahGCPhase phase(full_gc ?
>                                  ShenandoahPhaseTimings::full_gc_purge_par :
>                                  ShenandoahPhaseTimings::purge_par);
>          ShenandoahIsAliveSelector is_alive;
>          uint num_workers = _workers->active_workers();
>          ShenandoahParallelCleaningTask unlink_task(is_alive.is_alive_closure(), true, true, num_workers, purged_class);
>          _workers->run_task(&unlink_task);
>        }
> 
>      //BLOCK 3
>        {
>          ShenandoahGCPhase phase(full_gc ?
>                                  ShenandoahPhaseTimings::full_gc_purge_cldg :
>                                  ShenandoahPhaseTimings::purge_cldg);
>          ClassLoaderDataGraph::purge();
>        }
>        // Resize and verify metaspace
>        MetaspaceGC::compute_new_size();
>        MetaspaceUtils::verify_metrics();
>      }
> 
>      And here are the timings we got in the GC log:
>      ...
>        System Purge                      301 us
>          Unload Classes                17824 us
>          Weak Roots                      301 us
>          CLDG                              1 us
>      ...
> 
>      I suspect "Unload Classes" reports on BLOCK 1 only and not all three then? That would explain a lot. I can't find any reference to "Weak Roots" and "CLDG" seems to reflect BLOCK3, correct?
> 
>      Bernd
> 
>      On 7/30/20, 11:24 AM, "shenandoah-dev on behalf of Mathiske, Bernd" <shenandoah-dev-retn at openjdk.java.net on behalf of mathiske at amazon.com> wrote:
> 
>          Thanks! I'll add the patch and we will report back.
> 
>          On 7/30/20, 11:17 AM, "Zhengyu Gu" <zgu at redhat.com> wrote:
> 
>              CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
> 
> 
> 
>              Hi Bernd,
> 
>              On 7/30/20 2:12 PM, Mathiske, Bernd wrote:
>              > Zhengyu,
>              >
>              > is this bug you filed directly related?
>              > https://bugs.openjdk.java.net/browse/JDK-8250841
>              >
>              > Do you expect that the patch you also posted might help here?
>              > (http://cr.openjdk.java.net/~zgu/JDK-8250841/webrev.00/)
> 
>              Yes, it is directly related and should help the pause times, but not
>              sure by how much.
> 
>              -Zhengyu
> 
>              >
>              > Bernd
>              >
>              > On 7/29/20, 1:56 PM, "Zhengyu Gu" <zgu at redhat.com> wrote:
>              >
>              >      CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
>              >
>              >
>              >
>              >      Hi Niall,
>              >      >
>              >      > Based on all of this, we had some followup questions:
>              >      >
>              >      > * Is our understanding of -XX:+ClassUnloadingWithConcurrentMark correct - is its only impact to disable class unloading if set to false?
>              >      Yes, setting ClassUnloadingWithConcurrentMark to false, still disables
>              >      class unloading.
>              >
>              >      > * Is there a way to get a detailed breakdown of final mark timing and/or class unloading to expose string/symbol table cleanup time?
>              >      Not now. We can add these timings.
>              >
>              >      > * Is 16ms a reasonable duration to cleanup string/symbol tables given ~80K strings and ~310K symbols?
>              >
>              >      I don't know. Roman & Aleksey may have some insights.
>              >
>              >      > * Apart from increasing ShenandoahUnloadClassesFrequency, are there any options to reduce the impact of string/symbol table cleanup on final marking?
>              >
>              >      There is possibility that we can move string table cleaning to
>              >      concurrent phase. I believe string table is a concurrent hash table in
>              >      11u already.
>              >
>              >      Thanks,
>              >
>              >      -Zhengyu
>              >
>              >
>              >      >
>              >      >
>              >      > Thanks very much for your help!
>              >      >
>              >      > Niall and Bernd
>              >      >
>              >      > On 7/17/20, 08:38, "Zhengyu Gu" <zgu at redhat.com> wrote:
>              >      >
>              >      >      CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
>              >      >
>              >      >
>              >      >
>              >      >      On 7/17/20 11:13 AM, Connaughton, Niall wrote:
>              >      >      > Thanks Zhengyu, I'm looking into this, I'm seeing a lot of class loaders containing a single class. I'll test with class unloading enabled. Considering it's disabled by default, is there a downside to enabling it?
>              >      >
>              >      >      class unloading is also performed during final mark, so it impacts final
>              >      >      mark pause as well, but probably without this accumulation effect.
>              >      >
>              >      >      We have a parameter, ShenandoahUnloadClassesFrequency, to control
>              >      >      frequency of class unloading when it is enabled, default is once every 5
>              >      >      GC cycles. You may want to tune the parameter to achieve some sort of
>              >      >      balance.
>              >      >
>              >      >      Thanks,
>              >      >
>              >      >      -Zhengyu
>              >      >
>              >      >
>              >      >
>              >      >      >
>              >      >      > Roman & Aleksey, thanks for the details. I'm using 11.0.7, will setup a new test with an 11.0.8 build.
>              >      >      >
>              >      >      > Thanks,
>              >      >      > Niall
>              >      >      >
>              >      >      > On 7/17/20, 06:35, "Zhengyu Gu" <zgu at redhat.com> wrote:
>              >      >      >
>              >      >      >      CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
>              >      >      >
>              >      >      >
>              >      >      >
>              >      >      >      Hi Niall,
>              >      >      >
>              >      >      >      By default, class unloading is disabled for Shenandoah in JDK11.
>              >      >      >      Accumulated class loaders may prolong final mark. You may want to try:
>              >      >      >
>              >      >      >      jcmd <pid> VM.classloader_stats
>              >      >      >
>              >      >      >      If it is the case, you may want to try -XX:+ClassUnloadingWithConcurrentMark
>              >      >      >
>              >      >      >      Thanks,
>              >      >      >
>              >      >      >      -Zhengyu
>              >      >      >
>              >      >      >
>              >      >      >
>              >      >      >      On 7/17/20 12:56 AM, Connaughton, Niall wrote:
>              >      >      >      > Hey all, firstly thanks for all the work on Shenandoah, I’m excited for its potential for us.
>              >      >      >      >
>              >      >      >      > I’ve been doing some testing and have run across a couple of issues that have me scratching my head. The first is Final Mark pauses are increasing steadily over several days. On day 1, Final Mark pauses are 6.5-8.5ms. By day 3, they’re 12-16ms. The heap occupancy is not ramping up, and neither are the concurrent cycle times, so I’m not sure what’s behind this. The application is running a 20GB heap, peaking around 35% live data heap occupancy, and allocating ~1.3-1.5GB/s.
>              >      >      >      >
>              >      >      >      > What angles can I look at to dig into the cause of increasing Final Mark pauses? I don’t see a lot of details on the Final Mark in the gc logs, and there doesn’t seem to be much difference in the logs over time, except for the pause duration increasing. Here’s an example of a Final Mark log for before/after comparison:
>              >      >      >      >
>              >      >      >      > [2020-07-13T22:27:28.835+0000] GC(2224) Pause Final Mark
>              >      >      >      > [2020-07-13T22:27:28.835+0000] GC(2224) Using 8 of 8 workers for final marking
>              >      >      >      > [2020-07-13T22:27:28.839+0000] GC(2224) Adaptive CSet Selection. Target Free: 2047M, Actual Free: 2429M, Max CSet: 853M, Min Garbage: 0B
>              >      >      >      > [2020-07-13T22:27:28.840+0000] GC(2224) Collectable Garbage: 12171M (88% of total), 176M CSet, 1548 CSet regions
>              >      >      >      > [2020-07-13T22:27:28.840+0000] GC(2224) Immediate Garbage: 46151K (0% of total), 11 regions
>              >      >      >      > [2020-07-13T22:27:28.843+0000] GC(2224) Pause Final Mark 7.373ms
>              >      >      >      >
>              >      >      >      > [2020-07-15T23:25:05.780+0000] GC(24251) Pause Final Mark
>              >      >      >      > [2020-07-15T23:25:05.780+0000] GC(24251) Using 8 of 8 workers for final marking
>              >      >      >      > [2020-07-15T23:25:05.787+0000] GC(24251) Adaptive CSet Selection. Target Free: 2047M, Actual Free: 2513M, Max CSet: 853M, Min Garbage: 0B
>              >      >      >      > [2020-07-15T23:25:05.787+0000] GC(24251) Collectable Garbage: 12062M (88% of total), 184M CSet, 1535 CSet regions
>              >      >      >      > [2020-07-15T23:25:05.787+0000] GC(24251) Immediate Garbage: 34711K (0% of total), 5 regions
>              >      >      >      > [2020-07-15T23:25:05.792+0000] GC(24251) Pause Final Mark 11.790ms
>              >      >      >      >
>              >      >      >      >
>              >      >      >      > The second issue I ran into was that the pacer seemed to be adding a lot of latency. I couldn’t find any traces in the logs of the pacer’s activity. The summary at shutdown from gc+stats is useful, but having some signs of the pacer in the gc logs as the application runs would help correlate against other logs showing latency spikes. Is there any way to get more visibility on the pacer? Disabling the pacer removed the latency impact and we started seeing some really positive signs on the latency. I was expecting that we’d just see more degenerated GC and the latency would be similar, but this wasn’t the case.
>              >      >      >      >
>              >      >      >      > I’m generally happy running with the pacer disabled, especially as it seems we’ll have more visibility into degenerated GCs than we will over the pacer, so we can track regression more easily. So I’m asking this more for understanding than solving a blocking issue.
>              >      >      >      >
>              >      >      >      > Happy to take any pointers or provide any more info that would help.
>              >      >      >      >
>              >      >      >      > Thanks,
>              >      >      >      > Niall
>              >      >      >      >
>              >      >      >
>              >      >      >
>              >      >
>              >      >
>              >
>              >
> 
> 
> 
> 
-------------- next part --------------
# HG changeset patch
# Parent  102a1d3180f5b26806dd48e44d44020476e2bf4f

diff -r 102a1d3180f5 -r 58d6824f8731 src/hotspot/share/gc/shenandoah/parallelCleaning.cpp
--- a/src/hotspot/share/gc/shenandoah/parallelCleaning.cpp	Tue Aug 04 09:23:49 2020 -0400
+++ b/src/hotspot/share/gc/shenandoah/parallelCleaning.cpp	Tue Aug 04 14:46:22 2020 -0400
@@ -297,42 +297,57 @@
   }
 }
 
-ParallelCleaningTask::ParallelCleaningTask(BoolObjectClosure* is_alive,
+ParallelCleaningTask::ParallelCleaningTask(ShenandoahPhaseTimings::Phase phase,
+                                           BoolObjectClosure* is_alive,
                                            bool process_strings,
                                            bool process_symbols,
                                            uint num_workers,
                                            bool unloading_occurred) :
   AbstractGangTask("Parallel Cleaning"),
+  _phase(phase),
   _string_symbol_task(is_alive, process_strings, process_symbols),
   _code_cache_task(num_workers, is_alive, unloading_occurred),
   _klass_cleaning_task(is_alive),
   _resolved_method_cleaning_task(is_alive)
 {
-
-
 }
 
 // The parallel work done by all worker threads.
 void ParallelCleaningTask::work(uint worker_id) {
-  // Do first pass of code cache cleaning.
-  _code_cache_task.work_first_pass(worker_id);
+  {
+    ShenandoahWorkerTimingsTracker x(_phase, ShenandoahPhaseTimings::CodeCacheRoots, worker_id);
+    // Do first pass of code cache cleaning.
+    _code_cache_task.work_first_pass(worker_id);
 
-  // Let the threads mark that the first pass is done.
-  _code_cache_task.barrier_mark(worker_id);
+    // Let the threads mark that the first pass is done.
+    _code_cache_task.barrier_mark(worker_id);
+  }
 
-  // Clean the Strings and Symbols.
-  _string_symbol_task.work(worker_id);
+  {
+    ShenandoahWorkerTimingsTracker x(_phase, ShenandoahPhaseTimings::StringTableRoots, worker_id);
+    // Clean the Strings and Symbols.
+    _string_symbol_task.work(worker_id);
+  }
 
-  // Clean unreferenced things in the ResolvedMethodTable
-  _resolved_method_cleaning_task.work();
+  {
+    ShenandoahWorkerTimingsTracker x(_phase, ShenandoahPhaseTimings::ResolvedMethodTableRoots, worker_id);
+    // Clean unreferenced things in the ResolvedMethodTable
+    _resolved_method_cleaning_task.work();
+  }
 
-  // Wait for all workers to finish the first code cache cleaning pass.
-  _code_cache_task.barrier_wait(worker_id);
+  {
+    ShenandoahWorkerTimingsTracker x(_phase, ShenandoahPhaseTimings::CodeCacheRoots_2, worker_id);
+    // Wait for all workers to finish the first code cache cleaning pass.
+    _code_cache_task.barrier_wait(worker_id);
 
-  // Do the second code cache cleaning work, which realize on
-  // the liveness information gathered during the first pass.
-  _code_cache_task.work_second_pass(worker_id);
+    // Do the second code cache cleaning work, which realize on
+    // the liveness information gathered during the first pass.
+    _code_cache_task.work_second_pass(worker_id);
+  }
 
-  // Clean all klasses that were not unloaded.
-  _klass_cleaning_task.work();
+  {
+    ShenandoahWorkerTimingsTracker x(_phase, ShenandoahPhaseTimings::CLDGRoots, worker_id);
+    // Clean all klasses that were not unloaded.
+    _klass_cleaning_task.work();
+  }
 }
diff -r 102a1d3180f5 -r 58d6824f8731 src/hotspot/share/gc/shenandoah/parallelCleaning.hpp
--- a/src/hotspot/share/gc/shenandoah/parallelCleaning.hpp	Tue Aug 04 09:23:49 2020 -0400
+++ b/src/hotspot/share/gc/shenandoah/parallelCleaning.hpp	Tue Aug 04 14:46:22 2020 -0400
@@ -1,5 +1,5 @@
 /*
- * Copyright (c) 2015, 2018, Oracle and/or its affiliates. All rights reserved.
+ * Copyright (c) 2015, 2020, Oracle and/or its affiliates. All rights reserved.
  * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
  *
  * This code is free software; you can redistribute it and/or modify it
@@ -27,6 +27,7 @@
 
 #include "gc/shared/oopStorageParState.hpp"
 #include "gc/shared/workgroup.hpp"
+#include "gc/shenandoah/shenandoahPhaseTimings.hpp"
 
 class StringSymbolTableUnlinkTask : public AbstractGangTask {
 private:
@@ -138,14 +139,16 @@
 // To minimize the remark pause times, the tasks below are done in parallel.
 class ParallelCleaningTask : public AbstractGangTask {
 private:
-  StringSymbolTableUnlinkTask _string_symbol_task;
-  CodeCacheUnloadingTask      _code_cache_task;
-  KlassCleaningTask           _klass_cleaning_task;
-  ResolvedMethodCleaningTask  _resolved_method_cleaning_task;
+  StringSymbolTableUnlinkTask     _string_symbol_task;
+  CodeCacheUnloadingTask          _code_cache_task;
+  KlassCleaningTask               _klass_cleaning_task;
+  ResolvedMethodCleaningTask      _resolved_method_cleaning_task;
+  ShenandoahPhaseTimings::Phase   _phase;
 
 public:
   // The constructor is run in the VMThread.
-  ParallelCleaningTask(BoolObjectClosure* is_alive, bool process_strings, bool process_symbols, uint num_workers, bool unloading_occurred);
+  ParallelCleaningTask(ShenandoahPhaseTimings::Phase phase, BoolObjectClosure* is_alive, bool process_strings,
+    bool process_symbols, uint num_workers, bool unloading_occurred);
 
   // The parallel work done by all worker threads.
   void work(uint worker_id);
diff -r 102a1d3180f5 -r 58d6824f8731 src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp
--- a/src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp	Tue Aug 04 09:23:49 2020 -0400
+++ b/src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp	Tue Aug 04 14:46:22 2020 -0400
@@ -1930,15 +1930,21 @@
   if (!unload_classes()) return;
 
   // Unload classes and purge SystemDictionary.
+  bool purged_class;
   {
     ShenandoahGCSubPhase phase(full_gc ?
                                ShenandoahPhaseTimings::full_gc_purge_class_unload :
                                ShenandoahPhaseTimings::purge_class_unload);
-    bool purged_class = SystemDictionary::do_unloading(gc_timer());
-
+    purged_class = SystemDictionary::do_unloading(gc_timer());
+  }
+  {
+    ShenandoahPhaseTimings::Phase p = full_gc ?
+                                      ShenandoahPhaseTimings::full_gc_purge_cleanup :
+                                      ShenandoahPhaseTimings::purge_cleanup;
+    ShenandoahGCSubPhase phase(p);
     ShenandoahIsAliveSelector is_alive;
     uint num_workers = _workers->active_workers();
-    ParallelCleaningTask unlink_task(is_alive.is_alive_closure(), true, true, num_workers, purged_class);
+    ParallelCleaningTask unlink_task(p, is_alive.is_alive_closure(), true, true, num_workers, purged_class);
     _workers->run_task(&unlink_task);
   }
 
diff -r 102a1d3180f5 -r 58d6824f8731 src/hotspot/share/gc/shenandoah/shenandoahPhaseTimings.cpp
--- a/src/hotspot/share/gc/shenandoah/shenandoahPhaseTimings.cpp	Tue Aug 04 09:23:49 2020 -0400
+++ b/src/hotspot/share/gc/shenandoah/shenandoahPhaseTimings.cpp	Tue Aug 04 14:46:22 2020 -0400
@@ -103,9 +103,9 @@
     case full_gc_update_roots:
     case full_gc_adjust_roots:
     case degen_gc_update_roots:
-    case full_gc_purge_class_unload:
+    case full_gc_purge_cleanup:
     case full_gc_purge_weak_par:
-    case purge_class_unload:
+    case purge_cleanup:
     case purge_weak_par:
     case heap_iteration_roots:
       return true;
diff -r 102a1d3180f5 -r 58d6824f8731 src/hotspot/share/gc/shenandoah/shenandoahPhaseTimings.hpp
--- a/src/hotspot/share/gc/shenandoah/shenandoahPhaseTimings.hpp	Tue Aug 04 09:23:49 2020 -0400
+++ b/src/hotspot/share/gc/shenandoah/shenandoahPhaseTimings.hpp	Tue Aug 04 14:46:22 2020 -0400
@@ -36,6 +36,7 @@
   f(CNT_PREFIX ## TotalWork,                DESC_PREFIX "<total>")                     \
   f(CNT_PREFIX ## ThreadRoots,              DESC_PREFIX "Thread Roots")                \
   f(CNT_PREFIX ## CodeCacheRoots,           DESC_PREFIX "Code Cache Roots")            \
+  f(CNT_PREFIX ## CodeCacheRoots_2,         DESC_PREFIX "Code Cache Roots(2nd Pass)")  \
   f(CNT_PREFIX ## UniverseRoots,            DESC_PREFIX "Universe Roots")              \
   f(CNT_PREFIX ## JNIRoots,                 DESC_PREFIX "JNI Handles Roots")           \
   f(CNT_PREFIX ## JVMTIWeakRoots,           DESC_PREFIX "JVMTI Weak Roots")            \
@@ -79,6 +80,7 @@
   f(weakrefs_process,                               "    Process")                     \
   f(purge,                                          "  System Purge")                  \
   f(purge_class_unload,                             "    Unload Classes")              \
+  f(purge_cleanup,                                  "    Cleanup")                     \
   SHENANDOAH_PAR_PHASE_DO(purge_cu_par_,            "      CU: ", f)                   \
   f(purge_weak_par,                                 "    Weak Roots")                  \
   SHENANDOAH_PAR_PHASE_DO(purge_weak_par_,          "      WR: ", f)                   \
@@ -129,6 +131,7 @@
   f(full_gc_weakrefs_process,                       "      Process")                   \
   f(full_gc_purge,                                  "    System Purge")                \
   f(full_gc_purge_class_unload,                     "      Unload Classes")            \
+  f(full_gc_purge_cleanup,                          "      Cleanup")                   \
   SHENANDOAH_PAR_PHASE_DO(full_gc_purge_cu_par_,    "        CU: ", f)                 \
   f(full_gc_purge_weak_par,                         "      Weak Roots")                \
   SHENANDOAH_PAR_PHASE_DO(full_gc_purge_weak_p_,    "        WR: ", f)                 \


More information about the shenandoah-dev mailing list