Unexpected issues with Final Mark pauses and pacer performance in JDK11

Wed Aug 5 21:15:55 UTC 2020

Hello all,

Just a quick note from vacation:
- I am quite certain that we won't backport nmethod stuff and concurrent
class unloading to jdk11.
- We may backport native LRB and get concurrent string table, JNI etc
stuff, if it does not require code changes outside Shenandoah. However,
from the looks of it, it doesn't seem likely that it (fully) addresses your
issue.
- Would it be an option to turn off class-unloading altogether? If the
workload is not expected to unload (many) classes, than all that work is
for nothing anyway. Did lowering the frequency of class-unloading to every
100th cycle not help?

Cheers,
Roman

Zhengyu Gu <zgu at redhat.com> schrieb am Mi., 5. Aug. 2020, 21:43:

> Hi Bernd,
> >
> > Now we see that the long poles for STW pauses in this service are the
> code cache and CLDG root tasks in parallel cleaning ("Cleanup").
>
> Yes, that is pretty much what I expect.
>
> If we could subdue those we could consistently get into single-digit ms
> max pause territory. In later OpenJDK releases, we see efforts to make
> such phases concurrent, so there is potential for backports.
>
> That will request to backport nmethod barrier stuffs. IMHO, I don't see
> it will happen.
>
>   Alternatively, in case there are any problems in any of those cases,
> one could possibly split up the sub tasks across multiple GC cycles. Do
> they really all have to run in the same cycle? Are there known
> dependencies that link some tasks into the same cycle? (I suspect such a
> link between the class unloading block and the two code cache scanning
> blocks.)
>
> The main hurdle is that, we have to fix all code caches before mutators
> can resume. To find out which code caches are dead, it has to perform
> class unloading. I don't see how can break them up.
>
> The best we can do, I believe, is to move some of root evacuation to
> concurrent phase, e.g. string table, jni roots, etc. But just for that,
> we (Shenandoah) have to backport native LRB stuffs, and not sure if it
> is enough to achieve your goals.
>
> -Zhengyu
>
>
> >
> > Bernd
> >
> > On 8/4/20, 12:20 PM, "Mathiske, Bernd" <mathiske at amazon.com> wrote:
> >
> >      Zhengyu,
> >
> >      It is wonderful to have that patch. Thank you!
> >      I'll apply it and get some numbers.
> >
> >      Bernd
> >
> >      On 8/4/20, 12:09 PM, "Zhengyu Gu" <zgu at redhat.com> wrote:
> >
> >          CAUTION: This email originated from outside of the
> organization. Do not click links or open attachments unless you can confirm
> the sender and know the content is safe.
> >
> >
> >
> >          Forgot to mention, the patch is based on jdk11u.
> >
> >          Thanks,
> >
> >          -Zhengyu
> >
> >
> >
> >          On 8/4/20 3:02 PM, Zhengyu Gu wrote:
> >          > Hi Bern,
> >          >
> >          > Please find my current patch for adding detail timings for
> class
> >          > unloading cleanup works.
> >          >
> >          > I did not expect string table patch can make meaning impact,
> because it
> >          > does not seem that many strings die from early logs.
> >          >
> >          > -Zhengyu
> >          >
> >          >
> >          > On 8/4/20 12:52 PM, Mathiske, Bernd wrote:
> >          >> I should mention that we are using "OpenJDK 11.0.8 with
> Shenandoah" in
> >          >> this context and that's why only half the patch applied. In
> OpenJDK
> >          >> 15, both parts apply.
> >          >>
> >          >> On 8/4/20, 9:48 AM, "shenandoah-dev on behalf of Mathiske,
> Bernd"
> >          >> <shenandoah-dev-retn at openjdk.java.net on behalf of
> >          >> mathiske at amazon.com> wrote:
> >          >>
> >          >>      That did not help in this case. But thanks for the bug
> fix anyway!
> >          >>      BTW, I could only apply the string table part of the
> patch, the
> >          >> ResolvedMethod part does not apply.
> >          >>
> >          >>      So we've still got that problem with long pauses in
> final mark.
> >          >> Doubling parallel GC threads brought them from ~20ms to ~13.
> This only
> >          >> happens in final mark with class unloading. I found that
> there are
> >          >> three code blocks that could be responsible and maybe I have
> been
> >          >> looking at the wrong one.
> >          >>
> >          >>      void ShenandoahHeap::stw_unload_classes(bool full_gc) {
> >          >>        if (!unload_classes()) return;
> >          >>        bool purged_class;
> >          >>
> >          >>      // BLOCK 1
> >          >>        // Unload classes and purge SystemDictionary.
> >          >>        {
> >          >>          ShenandoahGCPhase phase(full_gc ?
> >          >>
> >          >> ShenandoahPhaseTimings::full_gc_purge_class_unload :
> >          >>
> >          >> ShenandoahPhaseTimings::purge_class_unload);
> >          >>          purged_class =
> SystemDictionary::do_unloading(gc_timer());
> >          >>        }
> >          >>
> >          >>      // BLOCK 2
> >          >>        {
> >          >>          ShenandoahGCPhase phase(full_gc ?
> >          >>
> >          >> ShenandoahPhaseTimings::full_gc_purge_par :
> >          >>
> ShenandoahPhaseTimings::purge_par);
> >          >>          ShenandoahIsAliveSelector is_alive;
> >          >>          uint num_workers = _workers->active_workers();
> >          >>          ShenandoahParallelCleaningTask
> >          >> unlink_task(is_alive.is_alive_closure(), true, true,
> num_workers,
> >          >> purged_class);
> >          >>          _workers->run_task(&unlink_task);
> >          >>        }
> >          >>
> >          >>      //BLOCK 3
> >          >>        {
> >          >>          ShenandoahGCPhase phase(full_gc ?
> >          >>
> >          >> ShenandoahPhaseTimings::full_gc_purge_cldg :
> >          >>
> ShenandoahPhaseTimings::purge_cldg);
> >          >>          ClassLoaderDataGraph::purge();
> >          >>        }
> >          >>        // Resize and verify metaspace
> >          >>        MetaspaceGC::compute_new_size();
> >          >>        MetaspaceUtils::verify_metrics();
> >          >>      }
> >          >>
> >          >>      And here are the timings we got in the GC log:
> >          >>      ...
> >          >>        System Purge                      301 us
> >          >>          Unload Classes                17824 us
> >          >>          Weak Roots                      301 us
> >          >>          CLDG                              1 us
> >          >>      ...
> >          >>
> >          >>      I suspect "Unload Classes" reports on BLOCK 1 only and
> not all
> >          >> three then? That would explain a lot. I can't find any
> reference to
> >          >> "Weak Roots" and "CLDG" seems to reflect BLOCK3, correct?
> >          >>
> >          >>      Bernd
> >          >>
> >          >>      On 7/30/20, 11:24 AM, "shenandoah-dev on behalf of
> Mathiske,
> >          >> Bernd" <shenandoah-dev-retn at openjdk.java.net on behalf of
> >          >> mathiske at amazon.com> wrote:
> >          >>
> >          >>          Thanks! I'll add the patch and we will report back.
> >          >>
> >          >>          On 7/30/20, 11:17 AM, "Zhengyu Gu" <zgu at redhat.com>
> wrote:
> >          >>
> >          >>              CAUTION: This email originated from outside of
> the
> >          >> organization. Do not click links or open attachments unless
> you can
> >          >> confirm the sender and know the content is safe.
> >          >>
> >          >>
> >          >>
> >          >>              Hi Bernd,
> >          >>
> >          >>              On 7/30/20 2:12 PM, Mathiske, Bernd wrote:
> >          >>              > Zhengyu,
> >          >>              >
> >          >>              > is this bug you filed directly related?
> >          >>              >
> https://bugs.openjdk.java.net/browse/JDK-8250841
> >          >>              >
> >          >>              > Do you expect that the patch you also posted
> might help
> >          >> here?
> >          >>              > (
> http://cr.openjdk.java.net/~zgu/JDK-8250841/webrev.00/)
> >          >>
> >          >>              Yes, it is directly related and should help the
> pause
> >          >> times, but not
> >          >>              sure by how much.
> >          >>
> >          >>              -Zhengyu
> >          >>
> >          >>              >
> >          >>              > Bernd
> >          >>              >
> >          >>              > On 7/29/20, 1:56 PM, "Zhengyu Gu" <
> zgu at redhat.com> wrote:
> >          >>              >
> >          >>              >      CAUTION: This email originated from
> outside of the
> >          >> organization. Do not click links or open attachments unless
> you can
> >          >> confirm the sender and know the content is safe.
> >          >>              >
> >          >>              >
> >          >>              >
> >          >>              >      Hi Niall,
> >          >>              >      >
> >          >>              >      > Based on all of this, we had some
> followup
> >          >> questions:
> >          >>              >      >
> >          >>              >      > * Is our understanding of
> >          >> -XX:+ClassUnloadingWithConcurrentMark correct - is its only
> impact to
> >          >> disable class unloading if set to false?
> >          >>              >      Yes, setting
> ClassUnloadingWithConcurrentMark to
> >          >> false, still disables
> >          >>              >      class unloading.
> >          >>              >
> >          >>              >      > * Is there a way to get a detailed
> breakdown of
> >          >> final mark timing and/or class unloading to expose
> string/symbol table
> >          >> cleanup time?
> >          >>              >      Not now. We can add these timings.
> >          >>              >
> >          >>              >      > * Is 16ms a reasonable duration to
> cleanup
> >          >> string/symbol tables given ~80K strings and ~310K symbols?
> >          >>              >
> >          >>              >      I don't know. Roman & Aleksey may have
> some insights.
> >          >>              >
> >          >>              >      > * Apart from increasing
> >          >> ShenandoahUnloadClassesFrequency, are there any options to
> reduce the
> >          >> impact of string/symbol table cleanup on final marking?
> >          >>              >
> >          >>              >      There is possibility that we can move
> string table
> >          >> cleaning to
> >          >>              >      concurrent phase. I believe string table
> is a
> >          >> concurrent hash table in
> >          >>              >      11u already.
> >          >>              >
> >          >>              >      Thanks,
> >          >>              >
> >          >>              >      -Zhengyu
> >          >>              >
> >          >>              >
> >          >>              >      >
> >          >>              >      >
> >          >>              >      > Thanks very much for your help!
> >          >>              >      >
> >          >>              >      > Niall and Bernd
> >          >>              >      >
> >          >>              >      > On 7/17/20, 08:38, "Zhengyu Gu" <
> zgu at redhat.com>
> >          >> wrote:
> >          >>              >      >
> >          >>              >      >      CAUTION: This email originated
> from outside
> >          >> of the organization. Do not click links or open attachments
> unless you
> >          >> can confirm the sender and know the content is safe.
> >          >>              >      >
> >          >>              >      >
> >          >>              >      >
> >          >>              >      >      On 7/17/20 11:13 AM, Connaughton,
> Niall wrote:
> >          >>              >      >      > Thanks Zhengyu, I'm looking
> into this,
> >          >> I'm seeing a lot of class loaders containing a single class.
> I'll test
> >          >> with class unloading enabled. Considering it's disabled by
> default, is
> >          >> there a downside to enabling it?
> >          >>              >      >
> >          >>              >      >      class unloading is also performed
> during
> >          >> final mark, so it impacts final
> >          >>              >      >      mark pause as well, but probably
> without
> >          >> this accumulation effect.
> >          >>              >      >
> >          >>              >      >      We have a parameter,
> >          >> ShenandoahUnloadClassesFrequency, to control
> >          >>              >      >      frequency of class unloading when
> it is
> >          >> enabled, default is once every 5
> >          >>              >      >      GC cycles. You may want to tune
> the
> >          >> parameter to achieve some sort of
> >          >>              >      >      balance.
> >          >>              >      >
> >          >>              >      >      Thanks,
> >          >>              >      >
> >          >>              >      >      -Zhengyu
> >          >>              >      >
> >          >>              >      >
> >          >>              >      >
> >          >>              >      >      >
> >          >>              >      >      > Roman & Aleksey, thanks for the
> details.
> >          >> I'm using 11.0.7, will setup a new test with an 11.0.8 build.
> >          >>              >      >      >
> >          >>              >      >      > Thanks,
> >          >>              >      >      > Niall
> >          >>              >      >      >
> >          >>              >      >      > On 7/17/20, 06:35, "Zhengyu Gu"
> >          >> <zgu at redhat.com> wrote:
> >          >>              >      >      >
> >          >>              >      >      >      CAUTION: This email
> originated from
> >          >> outside of the organization. Do not click links or open
> attachments
> >          >> unless you can confirm the sender and know the content is
> safe.
> >          >>              >      >      >
> >          >>              >      >      >
> >          >>              >      >      >
> >          >>              >      >      >      Hi Niall,
> >          >>              >      >      >
> >          >>              >      >      >      By default, class
> unloading is
> >          >> disabled for Shenandoah in JDK11.
> >          >>              >      >      >      Accumulated class loaders
> may
> >          >> prolong final mark. You may want to try:
> >          >>              >      >      >
> >          >>              >      >      >      jcmd <pid>
> VM.classloader_stats
> >          >>              >      >      >
> >          >>              >      >      >      If it is the case, you may
> want to
> >          >> try -XX:+ClassUnloadingWithConcurrentMark
> >          >>              >      >      >
> >          >>              >      >      >      Thanks,
> >          >>              >      >      >
> >          >>              >      >      >      -Zhengyu
> >          >>              >      >      >
> >          >>              >      >      >
> >          >>              >      >      >
> >          >>              >      >      >      On 7/17/20 12:56 AM,
> Connaughton,
> >          >> Niall wrote:
> >          >>              >      >      >      > Hey all, firstly thanks
> for all
> >          >> the work on Shenandoah, I’m excited for its potential for us.
> >          >>              >      >      >      >
> >          >>              >      >      >      > I’ve been doing some
> testing and
> >          >> have run across a couple of issues that have me scratching
> my head.
> >          >> The first is Final Mark pauses are increasing steadily over
> several
> >          >> days. On day 1, Final Mark pauses are 6.5-8.5ms. By day 3,
> they’re
> >          >> 12-16ms. The heap occupancy is not ramping up, and neither
> are the
> >          >> concurrent cycle times, so I’m not sure what’s behind this.
> The
> >          >> application is running a 20GB heap, peaking around 35% live
> data heap
> >          >> occupancy, and allocating ~1.3-1.5GB/s.
> >          >>              >      >      >      >
> >          >>              >      >      >      > What angles can I look
> at to dig
> >          >> into the cause of increasing Final Mark pauses? I don’t see
> a lot of
> >          >> details on the Final Mark in the gc logs, and there doesn’t
> seem to be
> >          >> much difference in the logs over time, except for the pause
> duration
> >          >> increasing. Here’s an example of a Final Mark log for
> before/after
> >          >> comparison:
> >          >>              >      >      >      >
> >          >>              >      >      >      >
> [2020-07-13T22:27:28.835+0000]
> >          >> GC(2224) Pause Final Mark
> >          >>              >      >      >      >
> [2020-07-13T22:27:28.835+0000]
> >          >> GC(2224) Using 8 of 8 workers for final marking
> >          >>              >      >      >      >
> [2020-07-13T22:27:28.839+0000]
> >          >> GC(2224) Adaptive CSet Selection. Target Free: 2047M, Actual
> Free:
> >          >> 2429M, Max CSet: 853M, Min Garbage: 0B
> >          >>              >      >      >      >
> [2020-07-13T22:27:28.840+0000]
> >          >> GC(2224) Collectable Garbage: 12171M (88% of total), 176M
> CSet, 1548
> >          >> CSet regions
> >          >>              >      >      >      >
> [2020-07-13T22:27:28.840+0000]
> >          >> GC(2224) Immediate Garbage: 46151K (0% of total), 11 regions
> >          >>              >      >      >      >
> [2020-07-13T22:27:28.843+0000]
> >          >> GC(2224) Pause Final Mark 7.373ms
> >          >>              >      >      >      >
> >          >>              >      >      >      >
> [2020-07-15T23:25:05.780+0000]
> >          >> GC(24251) Pause Final Mark
> >          >>              >      >      >      >
> [2020-07-15T23:25:05.780+0000]
> >          >> GC(24251) Using 8 of 8 workers for final marking
> >          >>              >      >      >      >
> [2020-07-15T23:25:05.787+0000]
> >          >> GC(24251) Adaptive CSet Selection. Target Free: 2047M,
> Actual Free:
> >          >> 2513M, Max CSet: 853M, Min Garbage: 0B
> >          >>              >      >      >      >
> [2020-07-15T23:25:05.787+0000]
> >          >> GC(24251) Collectable Garbage: 12062M (88% of total), 184M
> CSet, 1535
> >          >> CSet regions
> >          >>              >      >      >      >
> [2020-07-15T23:25:05.787+0000]
> >          >> GC(24251) Immediate Garbage: 34711K (0% of total), 5 regions
> >          >>              >      >      >      >
> [2020-07-15T23:25:05.792+0000]
> >          >> GC(24251) Pause Final Mark 11.790ms
> >          >>              >      >      >      >
> >          >>              >      >      >      >
> >          >>              >      >      >      > The second issue I ran
> into was
> >          >> that the pacer seemed to be adding a lot of latency. I
> couldn’t find
> >          >> any traces in the logs of the pacer’s activity. The summary
> at
> >          >> shutdown from gc+stats is useful, but having some signs of
> the pacer
> >          >> in the gc logs as the application runs would help correlate
> against
> >          >> other logs showing latency spikes. Is there any way to get
> more
> >          >> visibility on the pacer? Disabling the pacer removed the
> latency
> >          >> impact and we started seeing some really positive signs on
> the
> >          >> latency. I was expecting that we’d just see more degenerated
> GC and
> >          >> the latency would be similar, but this wasn’t the case.
> >          >>              >      >      >      >
> >          >>              >      >      >      > I’m generally happy
> running with
> >          >> the pacer disabled, especially as it seems we’ll have more
> visibility
> >          >> into degenerated GCs than we will over the pacer, so we can
> track
> >          >> regression more easily. So I’m asking this more for
> understanding than
> >          >> solving a blocking issue.
> >          >>              >      >      >      >
> >          >>              >      >      >      > Happy to take any
> pointers or
> >          >> provide any more info that would help.
> >          >>              >      >      >      >
> >          >>              >      >      >      > Thanks,
> >          >>              >      >      >      > Niall
> >          >>              >      >      >      >
> >          >>              >      >      >
> >          >>              >      >      >
> >          >>              >      >
> >          >>              >      >
> >          >>              >
> >          >>              >
> >          >>
> >          >>
> >          >>
> >          >>
> >
> >
> >
>
>