Unexpected issues with Final Mark pauses and pacer performance in JDK11

Zhengyu Gu zgu at redhat.com
Fri Jul 17 13:34:46 UTC 2020


Hi Niall,

By default, class unloading is disabled for Shenandoah in JDK11. 
Accumulated class loaders may prolong final mark. You may want to try:

jcmd <pid> VM.classloader_stats

If it is the case, you may want to try -XX:+ClassUnloadingWithConcurrentMark

Thanks,

-Zhengyu



On 7/17/20 12:56 AM, Connaughton, Niall wrote:
> Hey all, firstly thanks for all the work on Shenandoah, I’m excited for its potential for us.
> 
> I’ve been doing some testing and have run across a couple of issues that have me scratching my head. The first is Final Mark pauses are increasing steadily over several days. On day 1, Final Mark pauses are 6.5-8.5ms. By day 3, they’re 12-16ms. The heap occupancy is not ramping up, and neither are the concurrent cycle times, so I’m not sure what’s behind this. The application is running a 20GB heap, peaking around 35% live data heap occupancy, and allocating ~1.3-1.5GB/s.
> 
> What angles can I look at to dig into the cause of increasing Final Mark pauses? I don’t see a lot of details on the Final Mark in the gc logs, and there doesn’t seem to be much difference in the logs over time, except for the pause duration increasing. Here’s an example of a Final Mark log for before/after comparison:
> 
> [2020-07-13T22:27:28.835+0000] GC(2224) Pause Final Mark
> [2020-07-13T22:27:28.835+0000] GC(2224) Using 8 of 8 workers for final marking
> [2020-07-13T22:27:28.839+0000] GC(2224) Adaptive CSet Selection. Target Free: 2047M, Actual Free: 2429M, Max CSet: 853M, Min Garbage: 0B
> [2020-07-13T22:27:28.840+0000] GC(2224) Collectable Garbage: 12171M (88% of total), 176M CSet, 1548 CSet regions
> [2020-07-13T22:27:28.840+0000] GC(2224) Immediate Garbage: 46151K (0% of total), 11 regions
> [2020-07-13T22:27:28.843+0000] GC(2224) Pause Final Mark 7.373ms
> 
> [2020-07-15T23:25:05.780+0000] GC(24251) Pause Final Mark
> [2020-07-15T23:25:05.780+0000] GC(24251) Using 8 of 8 workers for final marking
> [2020-07-15T23:25:05.787+0000] GC(24251) Adaptive CSet Selection. Target Free: 2047M, Actual Free: 2513M, Max CSet: 853M, Min Garbage: 0B
> [2020-07-15T23:25:05.787+0000] GC(24251) Collectable Garbage: 12062M (88% of total), 184M CSet, 1535 CSet regions
> [2020-07-15T23:25:05.787+0000] GC(24251) Immediate Garbage: 34711K (0% of total), 5 regions
> [2020-07-15T23:25:05.792+0000] GC(24251) Pause Final Mark 11.790ms
> 
> 
> The second issue I ran into was that the pacer seemed to be adding a lot of latency. I couldn’t find any traces in the logs of the pacer’s activity. The summary at shutdown from gc+stats is useful, but having some signs of the pacer in the gc logs as the application runs would help correlate against other logs showing latency spikes. Is there any way to get more visibility on the pacer? Disabling the pacer removed the latency impact and we started seeing some really positive signs on the latency. I was expecting that we’d just see more degenerated GC and the latency would be similar, but this wasn’t the case.
> 
> I’m generally happy running with the pacer disabled, especially as it seems we’ll have more visibility into degenerated GCs than we will over the pacer, so we can track regression more easily. So I’m asking this more for understanding than solving a blocking issue.
> 
> Happy to take any pointers or provide any more info that would help.
> 
> Thanks,
> Niall
> 



More information about the shenandoah-dev mailing list