Unexpected issues with Final Mark pauses and pacer performance in JDK11
Zhengyu Gu
zgu at redhat.com
Tue Aug 4 19:08:19 UTC 2020
Forgot to mention, the patch is based on jdk11u.
Thanks,
-Zhengyu
On 8/4/20 3:02 PM, Zhengyu Gu wrote:
> Hi Bern,
>
> Please find my current patch for adding detail timings for class
> unloading cleanup works.
>
> I did not expect string table patch can make meaning impact, because it
> does not seem that many strings die from early logs.
>
> -Zhengyu
>
>
> On 8/4/20 12:52 PM, Mathiske, Bernd wrote:
>> I should mention that we are using "OpenJDK 11.0.8 with Shenandoah" in
>> this context and that's why only half the patch applied. In OpenJDK
>> 15, both parts apply.
>>
>> On 8/4/20, 9:48 AM, "shenandoah-dev on behalf of Mathiske, Bernd"
>> <shenandoah-dev-retn at openjdk.java.net on behalf of
>> mathiske at amazon.com> wrote:
>>
>> That did not help in this case. But thanks for the bug fix anyway!
>> BTW, I could only apply the string table part of the patch, the
>> ResolvedMethod part does not apply.
>>
>> So we've still got that problem with long pauses in final mark.
>> Doubling parallel GC threads brought them from ~20ms to ~13. This only
>> happens in final mark with class unloading. I found that there are
>> three code blocks that could be responsible and maybe I have been
>> looking at the wrong one.
>>
>> void ShenandoahHeap::stw_unload_classes(bool full_gc) {
>> if (!unload_classes()) return;
>> bool purged_class;
>>
>> // BLOCK 1
>> // Unload classes and purge SystemDictionary.
>> {
>> ShenandoahGCPhase phase(full_gc ?
>>
>> ShenandoahPhaseTimings::full_gc_purge_class_unload :
>>
>> ShenandoahPhaseTimings::purge_class_unload);
>> purged_class = SystemDictionary::do_unloading(gc_timer());
>> }
>>
>> // BLOCK 2
>> {
>> ShenandoahGCPhase phase(full_gc ?
>>
>> ShenandoahPhaseTimings::full_gc_purge_par :
>> ShenandoahPhaseTimings::purge_par);
>> ShenandoahIsAliveSelector is_alive;
>> uint num_workers = _workers->active_workers();
>> ShenandoahParallelCleaningTask
>> unlink_task(is_alive.is_alive_closure(), true, true, num_workers,
>> purged_class);
>> _workers->run_task(&unlink_task);
>> }
>>
>> //BLOCK 3
>> {
>> ShenandoahGCPhase phase(full_gc ?
>>
>> ShenandoahPhaseTimings::full_gc_purge_cldg :
>> ShenandoahPhaseTimings::purge_cldg);
>> ClassLoaderDataGraph::purge();
>> }
>> // Resize and verify metaspace
>> MetaspaceGC::compute_new_size();
>> MetaspaceUtils::verify_metrics();
>> }
>>
>> And here are the timings we got in the GC log:
>> ...
>> System Purge 301 us
>> Unload Classes 17824 us
>> Weak Roots 301 us
>> CLDG 1 us
>> ...
>>
>> I suspect "Unload Classes" reports on BLOCK 1 only and not all
>> three then? That would explain a lot. I can't find any reference to
>> "Weak Roots" and "CLDG" seems to reflect BLOCK3, correct?
>>
>> Bernd
>>
>> On 7/30/20, 11:24 AM, "shenandoah-dev on behalf of Mathiske,
>> Bernd" <shenandoah-dev-retn at openjdk.java.net on behalf of
>> mathiske at amazon.com> wrote:
>>
>> Thanks! I'll add the patch and we will report back.
>>
>> On 7/30/20, 11:17 AM, "Zhengyu Gu" <zgu at redhat.com> wrote:
>>
>> CAUTION: This email originated from outside of the
>> organization. Do not click links or open attachments unless you can
>> confirm the sender and know the content is safe.
>>
>>
>>
>> Hi Bernd,
>>
>> On 7/30/20 2:12 PM, Mathiske, Bernd wrote:
>> > Zhengyu,
>> >
>> > is this bug you filed directly related?
>> > https://bugs.openjdk.java.net/browse/JDK-8250841
>> >
>> > Do you expect that the patch you also posted might help
>> here?
>> > (http://cr.openjdk.java.net/~zgu/JDK-8250841/webrev.00/)
>>
>> Yes, it is directly related and should help the pause
>> times, but not
>> sure by how much.
>>
>> -Zhengyu
>>
>> >
>> > Bernd
>> >
>> > On 7/29/20, 1:56 PM, "Zhengyu Gu" <zgu at redhat.com> wrote:
>> >
>> > CAUTION: This email originated from outside of the
>> organization. Do not click links or open attachments unless you can
>> confirm the sender and know the content is safe.
>> >
>> >
>> >
>> > Hi Niall,
>> > >
>> > > Based on all of this, we had some followup
>> questions:
>> > >
>> > > * Is our understanding of
>> -XX:+ClassUnloadingWithConcurrentMark correct - is its only impact to
>> disable class unloading if set to false?
>> > Yes, setting ClassUnloadingWithConcurrentMark to
>> false, still disables
>> > class unloading.
>> >
>> > > * Is there a way to get a detailed breakdown of
>> final mark timing and/or class unloading to expose string/symbol table
>> cleanup time?
>> > Not now. We can add these timings.
>> >
>> > > * Is 16ms a reasonable duration to cleanup
>> string/symbol tables given ~80K strings and ~310K symbols?
>> >
>> > I don't know. Roman & Aleksey may have some insights.
>> >
>> > > * Apart from increasing
>> ShenandoahUnloadClassesFrequency, are there any options to reduce the
>> impact of string/symbol table cleanup on final marking?
>> >
>> > There is possibility that we can move string table
>> cleaning to
>> > concurrent phase. I believe string table is a
>> concurrent hash table in
>> > 11u already.
>> >
>> > Thanks,
>> >
>> > -Zhengyu
>> >
>> >
>> > >
>> > >
>> > > Thanks very much for your help!
>> > >
>> > > Niall and Bernd
>> > >
>> > > On 7/17/20, 08:38, "Zhengyu Gu" <zgu at redhat.com>
>> wrote:
>> > >
>> > > CAUTION: This email originated from outside
>> of the organization. Do not click links or open attachments unless you
>> can confirm the sender and know the content is safe.
>> > >
>> > >
>> > >
>> > > On 7/17/20 11:13 AM, Connaughton, Niall wrote:
>> > > > Thanks Zhengyu, I'm looking into this,
>> I'm seeing a lot of class loaders containing a single class. I'll test
>> with class unloading enabled. Considering it's disabled by default, is
>> there a downside to enabling it?
>> > >
>> > > class unloading is also performed during
>> final mark, so it impacts final
>> > > mark pause as well, but probably without
>> this accumulation effect.
>> > >
>> > > We have a parameter,
>> ShenandoahUnloadClassesFrequency, to control
>> > > frequency of class unloading when it is
>> enabled, default is once every 5
>> > > GC cycles. You may want to tune the
>> parameter to achieve some sort of
>> > > balance.
>> > >
>> > > Thanks,
>> > >
>> > > -Zhengyu
>> > >
>> > >
>> > >
>> > > >
>> > > > Roman & Aleksey, thanks for the details.
>> I'm using 11.0.7, will setup a new test with an 11.0.8 build.
>> > > >
>> > > > Thanks,
>> > > > Niall
>> > > >
>> > > > On 7/17/20, 06:35, "Zhengyu Gu"
>> <zgu at redhat.com> wrote:
>> > > >
>> > > > CAUTION: This email originated from
>> outside of the organization. Do not click links or open attachments
>> unless you can confirm the sender and know the content is safe.
>> > > >
>> > > >
>> > > >
>> > > > Hi Niall,
>> > > >
>> > > > By default, class unloading is
>> disabled for Shenandoah in JDK11.
>> > > > Accumulated class loaders may
>> prolong final mark. You may want to try:
>> > > >
>> > > > jcmd <pid> VM.classloader_stats
>> > > >
>> > > > If it is the case, you may want to
>> try -XX:+ClassUnloadingWithConcurrentMark
>> > > >
>> > > > Thanks,
>> > > >
>> > > > -Zhengyu
>> > > >
>> > > >
>> > > >
>> > > > On 7/17/20 12:56 AM, Connaughton,
>> Niall wrote:
>> > > > > Hey all, firstly thanks for all
>> the work on Shenandoah, I’m excited for its potential for us.
>> > > > >
>> > > > > I’ve been doing some testing and
>> have run across a couple of issues that have me scratching my head.
>> The first is Final Mark pauses are increasing steadily over several
>> days. On day 1, Final Mark pauses are 6.5-8.5ms. By day 3, they’re
>> 12-16ms. The heap occupancy is not ramping up, and neither are the
>> concurrent cycle times, so I’m not sure what’s behind this. The
>> application is running a 20GB heap, peaking around 35% live data heap
>> occupancy, and allocating ~1.3-1.5GB/s.
>> > > > >
>> > > > > What angles can I look at to dig
>> into the cause of increasing Final Mark pauses? I don’t see a lot of
>> details on the Final Mark in the gc logs, and there doesn’t seem to be
>> much difference in the logs over time, except for the pause duration
>> increasing. Here’s an example of a Final Mark log for before/after
>> comparison:
>> > > > >
>> > > > > [2020-07-13T22:27:28.835+0000]
>> GC(2224) Pause Final Mark
>> > > > > [2020-07-13T22:27:28.835+0000]
>> GC(2224) Using 8 of 8 workers for final marking
>> > > > > [2020-07-13T22:27:28.839+0000]
>> GC(2224) Adaptive CSet Selection. Target Free: 2047M, Actual Free:
>> 2429M, Max CSet: 853M, Min Garbage: 0B
>> > > > > [2020-07-13T22:27:28.840+0000]
>> GC(2224) Collectable Garbage: 12171M (88% of total), 176M CSet, 1548
>> CSet regions
>> > > > > [2020-07-13T22:27:28.840+0000]
>> GC(2224) Immediate Garbage: 46151K (0% of total), 11 regions
>> > > > > [2020-07-13T22:27:28.843+0000]
>> GC(2224) Pause Final Mark 7.373ms
>> > > > >
>> > > > > [2020-07-15T23:25:05.780+0000]
>> GC(24251) Pause Final Mark
>> > > > > [2020-07-15T23:25:05.780+0000]
>> GC(24251) Using 8 of 8 workers for final marking
>> > > > > [2020-07-15T23:25:05.787+0000]
>> GC(24251) Adaptive CSet Selection. Target Free: 2047M, Actual Free:
>> 2513M, Max CSet: 853M, Min Garbage: 0B
>> > > > > [2020-07-15T23:25:05.787+0000]
>> GC(24251) Collectable Garbage: 12062M (88% of total), 184M CSet, 1535
>> CSet regions
>> > > > > [2020-07-15T23:25:05.787+0000]
>> GC(24251) Immediate Garbage: 34711K (0% of total), 5 regions
>> > > > > [2020-07-15T23:25:05.792+0000]
>> GC(24251) Pause Final Mark 11.790ms
>> > > > >
>> > > > >
>> > > > > The second issue I ran into was
>> that the pacer seemed to be adding a lot of latency. I couldn’t find
>> any traces in the logs of the pacer’s activity. The summary at
>> shutdown from gc+stats is useful, but having some signs of the pacer
>> in the gc logs as the application runs would help correlate against
>> other logs showing latency spikes. Is there any way to get more
>> visibility on the pacer? Disabling the pacer removed the latency
>> impact and we started seeing some really positive signs on the
>> latency. I was expecting that we’d just see more degenerated GC and
>> the latency would be similar, but this wasn’t the case.
>> > > > >
>> > > > > I’m generally happy running with
>> the pacer disabled, especially as it seems we’ll have more visibility
>> into degenerated GCs than we will over the pacer, so we can track
>> regression more easily. So I’m asking this more for understanding than
>> solving a blocking issue.
>> > > > >
>> > > > > Happy to take any pointers or
>> provide any more info that would help.
>> > > > >
>> > > > > Thanks,
>> > > > > Niall
>> > > > >
>> > > >
>> > > >
>> > >
>> > >
>> >
>> >
>>
>>
>>
>>
More information about the shenandoah-dev
mailing list