Unexpected issues with Final Mark pauses and pacer performance in JDK11

Roman Kennke rkennke at redhat.com
Fri Jul 17 05:10:48 UTC 2020


Hi Niall,

Thanks for reporting this!

Which version of jdk11u/Shenandoah are you using?

Thanks,
Roman


Connaughton, Niall <conniall at amazon.com> schrieb am Fr., 17. Juli 2020,
06:57:

> Hey all, firstly thanks for all the work on Shenandoah, I’m excited for
> its potential for us.
>
> I’ve been doing some testing and have run across a couple of issues that
> have me scratching my head. The first is Final Mark pauses are increasing
> steadily over several days. On day 1, Final Mark pauses are 6.5-8.5ms. By
> day 3, they’re 12-16ms. The heap occupancy is not ramping up, and neither
> are the concurrent cycle times, so I’m not sure what’s behind this. The
> application is running a 20GB heap, peaking around 35% live data heap
> occupancy, and allocating ~1.3-1.5GB/s.
>
> What angles can I look at to dig into the cause of increasing Final Mark
> pauses? I don’t see a lot of details on the Final Mark in the gc logs, and
> there doesn’t seem to be much difference in the logs over time, except for
> the pause duration increasing. Here’s an example of a Final Mark log for
> before/after comparison:
>
> [2020-07-13T22:27:28.835+0000] GC(2224) Pause Final Mark
> [2020-07-13T22:27:28.835+0000] GC(2224) Using 8 of 8 workers for final
> marking
> [2020-07-13T22:27:28.839+0000] GC(2224) Adaptive CSet Selection. Target
> Free: 2047M, Actual Free: 2429M, Max CSet: 853M, Min Garbage: 0B
> [2020-07-13T22:27:28.840+0000] GC(2224) Collectable Garbage: 12171M (88%
> of total), 176M CSet, 1548 CSet regions
> [2020-07-13T22:27:28.840+0000] GC(2224) Immediate Garbage: 46151K (0% of
> total), 11 regions
> [2020-07-13T22:27:28.843+0000] GC(2224) Pause Final Mark 7.373ms
>
> [2020-07-15T23:25:05.780+0000] GC(24251) Pause Final Mark
> [2020-07-15T23:25:05.780+0000] GC(24251) Using 8 of 8 workers for final
> marking
> [2020-07-15T23:25:05.787+0000] GC(24251) Adaptive CSet Selection. Target
> Free: 2047M, Actual Free: 2513M, Max CSet: 853M, Min Garbage: 0B
> [2020-07-15T23:25:05.787+0000] GC(24251) Collectable Garbage: 12062M (88%
> of total), 184M CSet, 1535 CSet regions
> [2020-07-15T23:25:05.787+0000] GC(24251) Immediate Garbage: 34711K (0% of
> total), 5 regions
> [2020-07-15T23:25:05.792+0000] GC(24251) Pause Final Mark 11.790ms
>
>
> The second issue I ran into was that the pacer seemed to be adding a lot
> of latency. I couldn’t find any traces in the logs of the pacer’s activity.
> The summary at shutdown from gc+stats is useful, but having some signs of
> the pacer in the gc logs as the application runs would help correlate
> against other logs showing latency spikes. Is there any way to get more
> visibility on the pacer? Disabling the pacer removed the latency impact and
> we started seeing some really positive signs on the latency. I was
> expecting that we’d just see more degenerated GC and the latency would be
> similar, but this wasn’t the case.
>
> I’m generally happy running with the pacer disabled, especially as it
> seems we’ll have more visibility into degenerated GCs than we will over the
> pacer, so we can track regression more easily. So I’m asking this more for
> understanding than solving a blocking issue.
>
> Happy to take any pointers or provide any more info that would help.
>
> Thanks,
> Niall
>
>


More information about the shenandoah-dev mailing list