OpenJDK Long GC Safepoints in G1

Wed Apr 27 10:40:48 UTC 2016

Dear all,

I am currently deploying a small cassandra cluster using OpenJDK 8 with G1.

When I reduce the eden size (to 1400M) I get minor collections around 40ms.

The problem is that I start noticing lots of >100ms pauses outside GC.

When I look further into it, I notice that most of this pauses occur right after
a minor GC.

For example:

2016-04-27T06:59:20.255+0100: 53652.819: Total time for which application threads were stopped: 0.0388308 seconds
2016-04-27T06:59:21.572+0100: 53654.136: [GC pause (G1 Evacuation Pause) (young), 0.0384103 secs]
   [Parallel Time: 33.4 ms, GC Workers: 48]
      [GC Worker Start (ms): Min: 53654136.0, Avg: 53654136.1, Max: 53654136.3, Diff: 0.3]
      [Ext Root Scanning (ms): Min: 5.2, Avg: 5.3, Max: 5.5, Diff: 0.3, Sum: 256.7]
      [Update RS (ms): Min: 4.8, Avg: 5.8, Max: 14.9, Diff: 10.1, Sum: 279.8]
         [Processed Buffers: Min: 1, Avg: 12.3, Max: 33, Diff: 32, Sum: 591]
      [Scan RS (ms): Min: 0.0, Avg: 5.8, Max: 6.8, Diff: 6.7, Sum: 277.0]
      [Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
      [Object Copy (ms): Min: 12.7, Avg: 16.0, Max: 16.5, Diff: 3.8, Sum: 768.2]
      [Termination (ms): Min: 0.0, Avg: 0.1, Max: 0.2, Diff: 0.2, Sum: 6.2]
      [GC Worker Other (ms): Min: 0.0, Avg: 0.1, Max: 0.2, Diff: 0.2, Sum: 4.2]
      [GC Worker Total (ms): Min: 33.0, Avg: 33.2, Max: 33.4, Diff: 0.4, Sum: 1592.3]
      [GC Worker End (ms): Min: 53654169.2, Avg: 53654169.3, Max: 53654169.4, Diff: 0.2]
   [Code Root Fixup: 0.2 ms]
   [Code Root Migration: 0.0 ms]
   [Clear CT: 0.6 ms]
   [Other: 4.2 ms]
      [Choose CSet: 0.0 ms]
      [Ref Proc: 0.6 ms]
      [Ref Enq: 0.0 ms]
      [Free CSet: 1.3 ms]
   [Eden: 1400.0M(1400.0M)->0.0B(1400.0M) Survivors: 200.0M->200.0M Heap: 6002.8M(18.0G)->4615.1M(18.0G)]
 [Times: user=1.47 sys=0.00, real=0.04 secs]
2016-04-27T06:59:21.611+0100: 53654.174: Total time for which application threads were stopped: 0.3325235 seconds

This last pause of 332ms was reported at 53654.174 seconds after the JVM started. If a GC started at 53654.136
(about 40ms before) there is not enough space to fit 332ms.

I am assuming this 332ms pause is the duration of the safepoint that run the GC. Right?

However, I cannot see any obvious reason for the safepoint to take more than 8 times
the GC duration to execute. Do you have any idea?

Thanks in advance,
best regards,
Rodrigo Bruno

-- 
Rodrigo Bruno <rbruno at gsd.inesc-id.pt>