OpenJDK Long GC Safepoints in G1

Wed Apr 27 20:23:57 UTC 2016

Hi Rodrigo,

There are many reasons why a safe point would take longer than then actually time taken to do the maintenance work (GC included). Most involve threads not arriving at the safepoint in a timely manner. This could be how bits of code were jit’ed. It could be that you have long running counted loops that would prevent a thread from getting to a safepoint. It could be thread scheduling pressure or that the GC trigger the OS to engage in some other maintenance work. The information from this log can only let you know there is a problem. It doesn’t really tell you what that problem could be.

Regards,
Kirk

> On Apr 27, 2016, at 12:40 PM, Rodrigo Bruno <rbruno at gsd.inesc-id.pt> wrote:
> 
> Dear all,
> 
> I am currently deploying a small cassandra cluster using OpenJDK 8 with G1.
> 
> When I reduce the eden size (to 1400M) I get minor collections around 40ms.
> 
> The problem is that I start noticing lots of >100ms pauses outside GC.
> 
> When I look further into it, I notice that most of this pauses occur right after
> a minor GC.
> 
> For example:
> 
> 2016-04-27T06:59:20.255+0100: 53652.819: Total time for which application threads were stopped: 0.0388308 seconds
> 2016-04-27T06:59:21.572+0100: 53654.136: [GC pause (G1 Evacuation Pause) (young), 0.0384103 secs]
>   [Parallel Time: 33.4 ms, GC Workers: 48]
>      [GC Worker Start (ms): Min: 53654136.0, Avg: 53654136.1, Max: 53654136.3, Diff: 0.3]
>      [Ext Root Scanning (ms): Min: 5.2, Avg: 5.3, Max: 5.5, Diff: 0.3, Sum: 256.7]
>      [Update RS (ms): Min: 4.8, Avg: 5.8, Max: 14.9, Diff: 10.1, Sum: 279.8]
>         [Processed Buffers: Min: 1, Avg: 12.3, Max: 33, Diff: 32, Sum: 591]
>      [Scan RS (ms): Min: 0.0, Avg: 5.8, Max: 6.8, Diff: 6.7, Sum: 277.0]
>      [Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
>      [Object Copy (ms): Min: 12.7, Avg: 16.0, Max: 16.5, Diff: 3.8, Sum: 768.2]
>      [Termination (ms): Min: 0.0, Avg: 0.1, Max: 0.2, Diff: 0.2, Sum: 6.2]
>      [GC Worker Other (ms): Min: 0.0, Avg: 0.1, Max: 0.2, Diff: 0.2, Sum: 4.2]
>      [GC Worker Total (ms): Min: 33.0, Avg: 33.2, Max: 33.4, Diff: 0.4, Sum: 1592.3]
>      [GC Worker End (ms): Min: 53654169.2, Avg: 53654169.3, Max: 53654169.4, Diff: 0.2]
>   [Code Root Fixup: 0.2 ms]
>   [Code Root Migration: 0.0 ms]
>   [Clear CT: 0.6 ms]
>   [Other: 4.2 ms]
>      [Choose CSet: 0.0 ms]
>      [Ref Proc: 0.6 ms]
>      [Ref Enq: 0.0 ms]
>      [Free CSet: 1.3 ms]
>   [Eden: 1400.0M(1400.0M)->0.0B(1400.0M) Survivors: 200.0M->200.0M Heap: 6002.8M(18.0G)->4615.1M(18.0G)]
> [Times: user=1.47 sys=0.00, real=0.04 secs]
> 2016-04-27T06:59:21.611+0100: 53654.174: Total time for which application threads were stopped: 0.3325235 seconds
> 
> This last pause of 332ms was reported at 53654.174 seconds after the JVM started. If a GC started at 53654.136
> (about 40ms before) there is not enough space to fit 332ms.
> 
> I am assuming this 332ms pause is the duration of the safepoint that run the GC. Right?
> 
> However, I cannot see any obvious reason for the safepoint to take more than 8 times
> the GC duration to execute. Do you have any idea?
> 
> Thanks in advance,
> best regards,
> Rodrigo Bruno
> 
> --
> Rodrigo Bruno <rbruno at gsd.inesc-id.pt>

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 496 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20160427/4c5ef7dc/signature.asc>