GC_Locker turning CMS into a stop-the-world collector
Chi Ho Kwok
chkwok at digibites.nl
Mon Jan 18 14:47:41 PST 2010
Hi Doug,
On Mon, Jan 18, 2010 at 9:32 PM, Jones, Doug <doug.jones2 at hp.com> wrote:
> This is a long shot: in the logs below the problem behaviour appears to start in the abortable-preclean phase. That part of the CMS Collection does some interesting things, but can I believe be disabled by setting CMSMaxAbortablePrecleanTime to 0.
>
> You might like to try running your test program with the abortable-preclean phase turned off ...
Thanks, setting it to a very low value does help a lot. The test case
has been changed to increase reporting accuracy: in
DirectBufferStresser.run(), save the time before and after every
buffer operation, and logs every call taking longer than 100ms.
Disabled the other stderr warnings. The log with abortable preclean
time set to zero shows:
All OK:
Mem: 842, executed 450 direct buffer operations, 1 load operations.
CMS Start:
48.033: [GC [1 CMS-initial-mark: 1545976K(1835008K)]
1614526K(2031616K), 0.0135188 secs] [Times: user=0.00 sys=0.00,
real=0.01 secs]
48.047: [CMS-concurrent-mark-start]
Workers got stuck:
Mem: 842, executed 0 direct buffer operations, 3 load operations.
48.505: [CMS-concurrent-mark: 0.458/0.458 secs] [Times: user=0.45
sys=0.00, real=0.46 secs]
48.505: [CMS-concurrent-preclean-start]
48.510: [CMS-concurrent-preclean: 0.004/0.004 secs] [Times: user=0.00
sys=0.00, real=0.00 secs]
Preclean aborted in 0.02s real time:
48.510: [CMS-concurrent-abortable-preclean-start]
CMS: abort preclean due to time 48.530:
[CMS-concurrent-abortable-preclean: 0.020/0.020 secs] [Times:
user=0.03 sys=0.00, real=0.02 secs]
48.530: [GC[YG occupancy: 83340 K (196608 K)]48.530: [Rescan
(parallel) , 0.0095904 secs]48.540: [weak refs processing, 0.0000068
secs] [1 CMS-remark: 1545976K(1835008K)] 1629316K(2031616K), 0.0096730
secs] [Times: user=0.00 sys=0.00, real=0.01 secs]
Sweeping, threads still stuck:
48.540: [CMS-concurrent-sweep-start]
Mem: 842, executed 0 direct buffer operations, 2 load operations.
48.782: [CMS-concurrent-sweep: 0.242/0.242 secs] [Times: user=0.27
sys=0.00, real=0.24 secs]
48.782: [CMS-concurrent-reset-start]
GC is over, CMS-concurrent-reset doesn't block stuff, I guess;
everything is back to normal, but the workers complain about the time
spent inside one buffer operation.
Last buffer operation took 1001 ms
Mem: 684, executed 11 direct buffer operations, 3 load operations.
48.791: [CMS-concurrent-reset: 0.009/0.009 secs] [Times: user=0.00
sys=0.00, real=0.01 secs]
Last buffer operation took 1010 ms
Mem: 684, executed 654 direct buffer operations, 2 load operations.
This is quite a bit better than "Last buffer operation took 5971 ms"
running with the old vm arguments, but still, extrapolating to a 16G
heap, CMS will hold up work for about 8 seconds. Less bad than 20+, so
I'll be applying this workaround to prevent the last few nio calls
being stuck for too long. From what I've read, the abortable preclean
is just doing some work in advance for remark so that doesn't take too
long and trying to wait for a desired occupancy in Eden. I'll just cap
it at one second unless someone yells *don't*.
Chi Ho Kwok
More information about the hotspot-gc-use
mailing list