Degenerated GC pauses for 5-10 seconds

Thu Jan 17 20:27:39 UTC 2019

On 1/15/19 10:11 PM, Alexander Yakushev wrote:
> In the second log, it looks like it happened not during the concurrent
> reset, but between the other two phases. From the excerpt (
> https://gist.github.com/alexander-yakushev/549a1a600f53ed61249b8ae8184a3971
> ):
> 
> 2019-01-07T18:02:57.859+0000: 258123.422: [Concurrent update
> referencesUsing 25 of 25 workers for concurrent reference update
>  49G->50G(55G), 95.928 ms]
> Failed to allocate 2048K
> Cancelling GC: Allocation Failure
> 2019-01-07T18:03:03.153+0000: 258128.717: [Pause Final Update RefsUsing 25
> of 25 workers for final reference update
> , 2.967 ms]
> 
> In this case, the delay is not calculated, but there is a 5-second
> difference between two timestamps. I lost the full log for this case,
> unfortunately.

Okay. My theory right now is that ShenandoahControlThread that drives the cycle, and has to react on
what is going on, is deprived of cycles to run. This explains both "Concurrent reset" taking very
long, and the events timestamp lag. We can (should) try to make it less likely and add some logging
to diagnose these better.

Do you know if CPU time is very high (e.g. 100%) when thing like that happens?

Busy Java threads can steal a lot of CPU. *Allocating* Java threads would have to consult Shenandoah
pacer, but would be allowed to proceed anyway after ShenandoahPacingDelay is reached. So, maybe the
workaround is to beef up ShenandoahPacingDelay?

-Aleksey