Unexpected issues with Final Mark pauses and pacer performance in JDK11

Tue Aug 4 19:08:19 UTC 2020

Forgot to mention, the patch is based on jdk11u.

Thanks,

-Zhengyu

On 8/4/20 3:02 PM, Zhengyu Gu wrote:
> Hi Bern,
> 
> Please find my current patch for adding detail timings for class 
> unloading cleanup works.
> 
> I did not expect string table patch can make meaning impact, because it 
> does not seem that many strings die from early logs.
> 
> -Zhengyu
> 
> 
> On 8/4/20 12:52 PM, Mathiske, Bernd wrote:
>> I should mention that we are using "OpenJDK 11.0.8 with Shenandoah" in 
>> this context and that's why only half the patch applied. In OpenJDK 
>> 15, both parts apply.
>>
>> On 8/4/20, 9:48 AM, "shenandoah-dev on behalf of Mathiske, Bernd" 
>> <shenandoah-dev-retn at openjdk.java.net on behalf of 
>> mathiske at amazon.com> wrote:
>>
>>      That did not help in this case. But thanks for the bug fix anyway!
>>      BTW, I could only apply the string table part of the patch, the 
>> ResolvedMethod part does not apply.
>>
>>      So we've still got that problem with long pauses in final mark. 
>> Doubling parallel GC threads brought them from ~20ms to ~13. This only 
>> happens in final mark with class unloading. I found that there are 
>> three code blocks that could be responsible and maybe I have been 
>> looking at the wrong one.
>>
>>      void ShenandoahHeap::stw_unload_classes(bool full_gc) {
>>        if (!unload_classes()) return;
>>        bool purged_class;
>>
>>      // BLOCK 1
>>        // Unload classes and purge SystemDictionary.
>>        {
>>          ShenandoahGCPhase phase(full_gc ?
>>                                  
>> ShenandoahPhaseTimings::full_gc_purge_class_unload :
>>                                  
>> ShenandoahPhaseTimings::purge_class_unload);
>>          purged_class = SystemDictionary::do_unloading(gc_timer());
>>        }
>>
>>      // BLOCK 2
>>        {
>>          ShenandoahGCPhase phase(full_gc ?
>>                                  
>> ShenandoahPhaseTimings::full_gc_purge_par :
>>                                  ShenandoahPhaseTimings::purge_par);
>>          ShenandoahIsAliveSelector is_alive;
>>          uint num_workers = _workers->active_workers();
>>          ShenandoahParallelCleaningTask 
>> unlink_task(is_alive.is_alive_closure(), true, true, num_workers, 
>> purged_class);
>>          _workers->run_task(&unlink_task);
>>        }
>>
>>      //BLOCK 3
>>        {
>>          ShenandoahGCPhase phase(full_gc ?
>>                                  
>> ShenandoahPhaseTimings::full_gc_purge_cldg :
>>                                  ShenandoahPhaseTimings::purge_cldg);
>>          ClassLoaderDataGraph::purge();
>>        }
>>        // Resize and verify metaspace
>>        MetaspaceGC::compute_new_size();
>>        MetaspaceUtils::verify_metrics();
>>      }
>>
>>      And here are the timings we got in the GC log:
>>      ...
>>        System Purge                      301 us
>>          Unload Classes                17824 us
>>          Weak Roots                      301 us
>>          CLDG                              1 us
>>      ...
>>
>>      I suspect "Unload Classes" reports on BLOCK 1 only and not all 
>> three then? That would explain a lot. I can't find any reference to 
>> "Weak Roots" and "CLDG" seems to reflect BLOCK3, correct?
>>
>>      Bernd
>>
>>      On 7/30/20, 11:24 AM, "shenandoah-dev on behalf of Mathiske, 
>> Bernd" <shenandoah-dev-retn at openjdk.java.net on behalf of 
>> mathiske at amazon.com> wrote:
>>
>>          Thanks! I'll add the patch and we will report back.
>>
>>          On 7/30/20, 11:17 AM, "Zhengyu Gu" <zgu at redhat.com> wrote:
>>
>>              CAUTION: This email originated from outside of the 
>> organization. Do not click links or open attachments unless you can 
>> confirm the sender and know the content is safe.
>>
>>
>>
>>              Hi Bernd,
>>
>>              On 7/30/20 2:12 PM, Mathiske, Bernd wrote:
>>              > Zhengyu,
>>              >
>>              > is this bug you filed directly related?
>>              > https://bugs.openjdk.java.net/browse/JDK-8250841
>>              >
>>              > Do you expect that the patch you also posted might help 
>> here?
>>              > (http://cr.openjdk.java.net/~zgu/JDK-8250841/webrev.00/)
>>
>>              Yes, it is directly related and should help the pause 
>> times, but not
>>              sure by how much.
>>
>>              -Zhengyu
>>
>>              >
>>              > Bernd
>>              >
>>              > On 7/29/20, 1:56 PM, "Zhengyu Gu" <zgu at redhat.com> wrote:
>>              >
>>              >      CAUTION: This email originated from outside of the 
>> organization. Do not click links or open attachments unless you can 
>> confirm the sender and know the content is safe.
>>              >
>>              >
>>              >
>>              >      Hi Niall,
>>              >      >
>>              >      > Based on all of this, we had some followup 
>> questions:
>>              >      >
>>              >      > * Is our understanding of 
>> -XX:+ClassUnloadingWithConcurrentMark correct - is its only impact to 
>> disable class unloading if set to false?
>>              >      Yes, setting ClassUnloadingWithConcurrentMark to 
>> false, still disables
>>              >      class unloading.
>>              >
>>              >      > * Is there a way to get a detailed breakdown of 
>> final mark timing and/or class unloading to expose string/symbol table 
>> cleanup time?
>>              >      Not now. We can add these timings.
>>              >
>>              >      > * Is 16ms a reasonable duration to cleanup 
>> string/symbol tables given ~80K strings and ~310K symbols?
>>              >
>>              >      I don't know. Roman & Aleksey may have some insights.
>>              >
>>              >      > * Apart from increasing 
>> ShenandoahUnloadClassesFrequency, are there any options to reduce the 
>> impact of string/symbol table cleanup on final marking?
>>              >
>>              >      There is possibility that we can move string table 
>> cleaning to
>>              >      concurrent phase. I believe string table is a 
>> concurrent hash table in
>>              >      11u already.
>>              >
>>              >      Thanks,
>>              >
>>              >      -Zhengyu
>>              >
>>              >
>>              >      >
>>              >      >
>>              >      > Thanks very much for your help!
>>              >      >
>>              >      > Niall and Bernd
>>              >      >
>>              >      > On 7/17/20, 08:38, "Zhengyu Gu" <zgu at redhat.com> 
>> wrote:
>>              >      >
>>              >      >      CAUTION: This email originated from outside 
>> of the organization. Do not click links or open attachments unless you 
>> can confirm the sender and know the content is safe.
>>              >      >
>>              >      >
>>              >      >
>>              >      >      On 7/17/20 11:13 AM, Connaughton, Niall wrote:
>>              >      >      > Thanks Zhengyu, I'm looking into this, 
>> I'm seeing a lot of class loaders containing a single class. I'll test 
>> with class unloading enabled. Considering it's disabled by default, is 
>> there a downside to enabling it?
>>              >      >
>>              >      >      class unloading is also performed during 
>> final mark, so it impacts final
>>              >      >      mark pause as well, but probably without 
>> this accumulation effect.
>>              >      >
>>              >      >      We have a parameter, 
>> ShenandoahUnloadClassesFrequency, to control
>>              >      >      frequency of class unloading when it is 
>> enabled, default is once every 5
>>              >      >      GC cycles. You may want to tune the 
>> parameter to achieve some sort of
>>              >      >      balance.
>>              >      >
>>              >      >      Thanks,
>>              >      >
>>              >      >      -Zhengyu
>>              >      >
>>              >      >
>>              >      >
>>              >      >      >
>>              >      >      > Roman & Aleksey, thanks for the details. 
>> I'm using 11.0.7, will setup a new test with an 11.0.8 build.
>>              >      >      >
>>              >      >      > Thanks,
>>              >      >      > Niall
>>              >      >      >
>>              >      >      > On 7/17/20, 06:35, "Zhengyu Gu" 
>> <zgu at redhat.com> wrote:
>>              >      >      >
>>              >      >      >      CAUTION: This email originated from 
>> outside of the organization. Do not click links or open attachments 
>> unless you can confirm the sender and know the content is safe.
>>              >      >      >
>>              >      >      >
>>              >      >      >
>>              >      >      >      Hi Niall,
>>              >      >      >
>>              >      >      >      By default, class unloading is 
>> disabled for Shenandoah in JDK11.
>>              >      >      >      Accumulated class loaders may 
>> prolong final mark. You may want to try:
>>              >      >      >
>>              >      >      >      jcmd <pid> VM.classloader_stats
>>              >      >      >
>>              >      >      >      If it is the case, you may want to 
>> try -XX:+ClassUnloadingWithConcurrentMark
>>              >      >      >
>>              >      >      >      Thanks,
>>              >      >      >
>>              >      >      >      -Zhengyu
>>              >      >      >
>>              >      >      >
>>              >      >      >
>>              >      >      >      On 7/17/20 12:56 AM, Connaughton, 
>> Niall wrote:
>>              >      >      >      > Hey all, firstly thanks for all 
>> the work on Shenandoah, I’m excited for its potential for us.
>>              >      >      >      >
>>              >      >      >      > I’ve been doing some testing and 
>> have run across a couple of issues that have me scratching my head. 
>> The first is Final Mark pauses are increasing steadily over several 
>> days. On day 1, Final Mark pauses are 6.5-8.5ms. By day 3, they’re 
>> 12-16ms. The heap occupancy is not ramping up, and neither are the 
>> concurrent cycle times, so I’m not sure what’s behind this. The 
>> application is running a 20GB heap, peaking around 35% live data heap 
>> occupancy, and allocating ~1.3-1.5GB/s.
>>              >      >      >      >
>>              >      >      >      > What angles can I look at to dig 
>> into the cause of increasing Final Mark pauses? I don’t see a lot of 
>> details on the Final Mark in the gc logs, and there doesn’t seem to be 
>> much difference in the logs over time, except for the pause duration 
>> increasing. Here’s an example of a Final Mark log for before/after 
>> comparison:
>>              >      >      >      >
>>              >      >      >      > [2020-07-13T22:27:28.835+0000] 
>> GC(2224) Pause Final Mark
>>              >      >      >      > [2020-07-13T22:27:28.835+0000] 
>> GC(2224) Using 8 of 8 workers for final marking
>>              >      >      >      > [2020-07-13T22:27:28.839+0000] 
>> GC(2224) Adaptive CSet Selection. Target Free: 2047M, Actual Free: 
>> 2429M, Max CSet: 853M, Min Garbage: 0B
>>              >      >      >      > [2020-07-13T22:27:28.840+0000] 
>> GC(2224) Collectable Garbage: 12171M (88% of total), 176M CSet, 1548 
>> CSet regions
>>              >      >      >      > [2020-07-13T22:27:28.840+0000] 
>> GC(2224) Immediate Garbage: 46151K (0% of total), 11 regions
>>              >      >      >      > [2020-07-13T22:27:28.843+0000] 
>> GC(2224) Pause Final Mark 7.373ms
>>              >      >      >      >
>>              >      >      >      > [2020-07-15T23:25:05.780+0000] 
>> GC(24251) Pause Final Mark
>>              >      >      >      > [2020-07-15T23:25:05.780+0000] 
>> GC(24251) Using 8 of 8 workers for final marking
>>              >      >      >      > [2020-07-15T23:25:05.787+0000] 
>> GC(24251) Adaptive CSet Selection. Target Free: 2047M, Actual Free: 
>> 2513M, Max CSet: 853M, Min Garbage: 0B
>>              >      >      >      > [2020-07-15T23:25:05.787+0000] 
>> GC(24251) Collectable Garbage: 12062M (88% of total), 184M CSet, 1535 
>> CSet regions
>>              >      >      >      > [2020-07-15T23:25:05.787+0000] 
>> GC(24251) Immediate Garbage: 34711K (0% of total), 5 regions
>>              >      >      >      > [2020-07-15T23:25:05.792+0000] 
>> GC(24251) Pause Final Mark 11.790ms
>>              >      >      >      >
>>              >      >      >      >
>>              >      >      >      > The second issue I ran into was 
>> that the pacer seemed to be adding a lot of latency. I couldn’t find 
>> any traces in the logs of the pacer’s activity. The summary at 
>> shutdown from gc+stats is useful, but having some signs of the pacer 
>> in the gc logs as the application runs would help correlate against 
>> other logs showing latency spikes. Is there any way to get more 
>> visibility on the pacer? Disabling the pacer removed the latency 
>> impact and we started seeing some really positive signs on the 
>> latency. I was expecting that we’d just see more degenerated GC and 
>> the latency would be similar, but this wasn’t the case.
>>              >      >      >      >
>>              >      >      >      > I’m generally happy running with 
>> the pacer disabled, especially as it seems we’ll have more visibility 
>> into degenerated GCs than we will over the pacer, so we can track 
>> regression more easily. So I’m asking this more for understanding than 
>> solving a blocking issue.
>>              >      >      >      >
>>              >      >      >      > Happy to take any pointers or 
>> provide any more info that would help.
>>              >      >      >      >
>>              >      >      >      > Thanks,
>>              >      >      >      > Niall
>>              >      >      >      >
>>              >      >      >
>>              >      >      >
>>              >      >
>>              >      >
>>              >
>>              >
>>
>>
>>
>>