long unaccounted pause during concurrent mark phase of ParNew cycle

atulksh atulksh at hotmail.com
Thu Feb 18 01:16:40 UTC 2010


We have been able to workaround this issue with the help of Sun engineers
(especially Ramki - thank you very much). I wanted to update this thread
with that information so it can help anybody else who may bump up into such
a problem.

What we have been able to identify as yet, is there is some sort of thread
synchronization issue that happens during the concurrent mark phase in CMS
cycle (especially near end of the concurrent mark cycle), where the CMS
threads do not yield control to the gc threads who want to do a scavenge
(young generation collection). The young generation is full and needs a
scavenge, the scavenge cannot start because it is waiting for the CMS
threads to relinquish control, and all application threads wait for the
scavenge to happen. The application goes in a frozen state. Eventually -
after tens of seconds - this wait is broken (do not know how) and the
scavenge happens and the world is comes to normality and the CMS cycle
continues without problems henceforth. This happens pretty much in every CMS
cycle.

The above only happens if there are more than 1 concurrent marking threads
enabled - which in our cases happened only on multi-core machines with 4 or
8 cores. 8 core machines showed this problem much more prominantly than 4
core machines (where at least in a test environment we could reproduce this
problem on after manually bumping the ParallelCMSThreads parameter to 4).
The more the number of concurrent CMS threads the more likelihood the
problem will occur.

Our workaround for the issue was to use the -XX:-CMSConcurrentMTEnabled to
disable the use of concurrent CMS threads. So far we have not seen any
significant degradation in the gc activity due to disabling concurrent MT in
CMs with the flag.

There is an issue in Sun's bug database for the garbage collector that
points to such an issue. Sun engineers are trying to fix this issue as the
bug indicates.

http://bugs.sun.com/view_bug.do?bug_id=6692906
-- 
View this message in context: http://old.nabble.com/long-unaccounted-pause-during-concurrent-mark-phase-of-ParNew-cycle-tp27459365p27633143.html
Sent from the OpenJDK Hotspot Garbage Collection mailing list archive at Nabble.com.




More information about the hotspot-gc-dev mailing list