Degenerated GC pauses for 5-10 seconds

Fri Jan 18 12:06:51 UTC 2019

On 1/18/19 9:57 AM, Alexander Yakushev wrote:
> The service in question is actually not very CPU-bound. At the moment of the pause it had ~35% CPU
> utilization (0.65 la1) over 10-second period. I can't tell if CPU utilization didn't spike for a
> short time inbetween the 10-sec, but I guess it would reveal itself by rising the aggregate value.
> Plus, the CPU util should have stayed 100% for the whole GC duration for ShenandoahControlThread to
> be denied CPU for so long, right?

Yes, right. Still weird. Are you running in container? Or maybe you have CPU quotas that prohibit
threads to run?

> For now, I switched Shenandoah's heuristic to static and gave it FreeThreshold twice bigger than the
> adaptive heuristic calculated. Haven't observed these weird pauses again yet, but if I do I will
> bring them here.

Yes, thanks. I'll try to corner Shenandoah locally meanwhile.

> On a slightly tangential note, is it possible (or would be possible) to observe the rate of the
> pacer not only once the process dies (in the final stats), but during the runtime? Perhaps, an
> MXBean that exposes rough number of injected pacing delays across all threads would be helpful.

Look around ShenandoahAllocationTrace and ShenandoahAllocationTraceThreshold. It would print
warnings to GC log if stalls are detected. You can also employ a nuclear option: set
ShenandoahPacingDelay=${HUGE_VAL}, for example 30000. Then you should never Degen/Full GC, and
instead block the allocating thread waiting for GC to complete.

-Aleksey