MXBeans to report pause times seem to indicate the complete GC cycle time

Wed Apr 4 14:56:55 UTC 2018

Hi:

In our Java application, the GarbageCollectorMXBean is used to know the pause time introduced by each GC execution.
For CMS and G1, the time reported by GarbageCollectorMXBean is the time where our application has been actually paused, and doesn't consider the time the GC has ben running concurrently with our application.

However, I've noticed that Shenandoah seem to report the whole time needed for the collection to complete, considering both the pause times and the concurrent times.
IMHO, it should only indicate the pause time, and not the whole GC cycle time.

For instance, the output of our application before and after a GC collection is as follows, which seem to indicate a pause time of 56ms:
policy-lra-baf1mad-mgmt> java gc stats
103 Multi-line response follows.
Collector         Count  Time(ms)
----------------  -----  --------
Shenandoah Major      0         0
Shenandoah Minor      1       109
100 Ok.

policy-lra-baf1mad-mgmt> java gc stats
103 Multi-line response follows.
Collector         Count  Time(ms)
----------------  -----  --------
Shenandoah Major      0         0
Shenandoah Minor      2       165 <<<<<<<<<< 56ms from previous GC execution
100 Ok.

However, when seeing the gc.logs (OpenJDK 8u151 is used in this capture), the seems to show that the actual pause time has been shorter, around 12ms

2018-04-04T21:04:32.576+0200: 48.891: [Pause Init Mark, 3.751 ms] <<<<<<<<<<<<<<<<<<<<
2018-04-04T21:04:32.580+0200: 48.895: [Concurrent marking 238M->238M(4620M), 28.796 ms]
2018-04-04T21:04:32.609+0200: 48.924: [Pause Final MarkTotal Garbage: 147M
Immediate Garbage: 14M, 7 regions (10% of total)
Garbage to be collected: 117M (79% of total), 62 regions
Live objects to be evacuated: 6M
Live/garbage ratio in collected regions: 5%
238M->232M(4620M), 6.671 ms] <<<<<<<<<<<<<<<<<<<<<
2018-04-04T21:04:32.616+0200: 48.931: [Concurrent evacuation 236M->244M(4620M), 1.765 ms]
2018-04-04T21:04:32.618+0200: 48.933: [Pause Init Update Refs, 0.085 ms] <<<<<<<<<<<<<<<<
2018-04-04T21:04:32.618+0200: 48.934: [Concurrent update references  246M->248M(4620M), 11.012 ms]
2018-04-04T21:04:32.630+0200: 48.945: [Pause Final Update Refs 248M->124M(4620M), 1.548 ms] <<<<<<<<<<<
2018-04-04T21:04:32.631+0200: 48.947: [Concurrent reset bitmaps 124M->124M(4620M), 0.419 ms]
Capacity: 4620M, Peak Occupancy: 248M, Lowest Free: 4372M, Free Threshold: 138M

I've force a GC collection with system.gc(), but having -XX:+ExplicitGCInvokesConcurrent

Similarly, the same happens when a periodic GC occurs every 5m.
In the next capture, it apparently indicates a pause of 73ms

policy-lra-baf1mad-mgmt> java gc stats
103 Multi-line response follows.
Collector         Count  Time(ms)
----------------  -----  --------
Shenandoah Major      0         0
Shenandoah Minor     86      5009
100 Ok.
policy-lra-baf1mad-mgmt> java gc stats
103 Multi-line response follows.
Collector         Count  Time(ms)
----------------  -----  --------
Shenandoah Major      0         0
Shenandoah Minor     87      5082
100 Ok.

But the gc.logs indicate around 14ms
2018-04-04T16:46:16.940+0200: 25267.673: [Pause Init Mark, 6.898 ms]
2018-04-04T16:46:16.947+0200: 25267.680: [Concurrent marking 248M->248M(4620M), 37.315 ms]
2018-04-04T16:46:16.985+0200: 25267.718: [Pause Final MarkTotal Garbage: 117M
Immediate Garbage: 14M, 7 regions (20% of total)
Garbage to be collected: 52M (44% of total), 27 regions
Live objects to be evacuated: 1M
Live/garbage ratio in collected regions: 2%
248M->250M(4620M), 4.931 ms]
2018-04-04T16:46:16.990+0200: 25267.723: [Concurrent evacuation 250M->250M(4620M), 0.841 ms]
2018-04-04T16:46:16.991+0200: 25267.725: [Pause Init Update Refs, 0.102 ms]
2018-04-04T16:46:16.991+0200: 25267.725: [Concurrent update references  250M->250M(4620M), 17.746 ms]
2018-04-04T16:46:17.010+0200: 25267.743: [Pause Final Update Refs 250M->196M(4620M), 2.157 ms]
2018-04-04T16:46:17.012+0200: 25267.745: [Concurrent reset bitmaps 196M->196M(4620M), 0.539 ms]

In both cases, our application was mainly idle.
I've observed this with the Open JDK 8u151 distributed in the RHEL 7.5 beta.

I think that Shenandoah GC should only report in the MXBean the time it has paused the Java application, in the same way it is done with G1 and CMS+ParNew, and not the whole processing time (pause + concurrent time)

Thanks,
                Pedro

PEDRO VITON
NOKIA
AAA Specialist
C/ Maria Tubau 9 (28050 Madrid - SPAIN)
M: +34 690 964740 (ext. 2411 5746)
pedro.viton at nokia.com<mailto:pedro.viton at alcatel-lucent.com>