Degenerated GC pauses for 5-10 seconds

Fri Jan 18 08:57:17 UTC 2019

On Thu, 17 Jan 2019 at 22:27, Aleksey Shipilev <shade at redhat.com> wrote:

> Okay. My theory right now is that ShenandoahControlThread that drives the
> cycle, and has to react on
> what is going on, is deprived of cycles to run. This explains both
> "Concurrent reset" taking very
> long, and the events timestamp lag. We can (should) try to make it less
> likely and add some logging
> to diagnose these better.
>
> Do you know if CPU time is very high (e.g. 100%) when thing like that
> happens?
>
> Busy Java threads can steal a lot of CPU. *Allocating* Java threads would
> have to consult Shenandoah
> pacer, but would be allowed to proceed anyway after ShenandoahPacingDelay
> is reached. So, maybe the
> workaround is to beef up ShenandoahPacingDelay?
>

The service in question is actually not very CPU-bound. At the moment of
the pause it had ~35% CPU utilization (0.65 la1) over 10-second period. I
can't tell if CPU utilization didn't spike for a short time inbetween the
10-sec, but I guess it would reveal itself by rising the aggregate value.
Plus, the CPU util should have stayed 100% for the whole GC duration for
ShenandoahControlThread to be denied CPU for so long, right?

For now, I switched Shenandoah's heuristic to static and gave it
FreeThreshold twice bigger than the adaptive heuristic calculated. Haven't
observed these weird pauses again yet, but if I do I will bring them here.

On a slightly tangential note, is it possible (or would be possible) to
observe the rate of the pacer not only once the process dies (in the final
stats), but during the runtime? Perhaps, an MXBean that exposes rough
number of injected pacing delays across all threads would be helpful.

Best regards,
Alex Yakushev