RFR(S): 7147724: G1: hang in SurrogateLockerThread::manipulatePLL
John Cuthbertson
john.cuthbertson at oracle.com
Thu Mar 8 18:13:05 UTC 2012
Hi Everyone,
A new version of theses changes, incorporating changes based upon the
suggestions and questions from Bengt, can be found at:
http://cr.openjdk.java.net/~johnc/7147724/webrev.1/
Changes in this version include:
* Identification of another path where a Java thread could lock and wait
on the SecondaryFreeList_lock without checking for a safepoint.
* The determination of whether to perform a safepoint check when
locking/waitin on the SecondaryFreeList_lock is made based upon the type
of thread - for Java threads a safepoint check is required.
Thanks,
JohnC
On 03/05/12 10:37, John Cuthbertson wrote:
> Hi Everyone,
>
> Can I have a couple of volunteers to review the changes for this CR?
> The webrev can be found at:
> http://cr.openjdk.java.net/~johnc/7147724/webrev.0/
>
> Summary:
> There are a couple of issues, which look like hangs, that the changes
> in this CR address.
>
> The first issue is that a thread, while attempting to allocate a
> humongous object, would have the initial mark pause not succeed. It
> would then continuously retry the pause (which would continously
> fail). There are a couple of reasons for this. When several threads,
> while attempting to allocate a humongous object, would determine that
> a marking cycle was to be initiated - they would race to initiate the
> initial mark pause. One thread would win and the losers would end up
> failing the VM_G1IncCollectionPause::doit_prologue(). The losers would
> then keep retrying to schedule the initial mark pause, and keep
> failing in the prologue, while marking was in progress. Similarly the
> initial mark pause itself could fail because the GC locker had just
> become active. This also had the effect making the requesting thread
> continuously retrying to schedule the pause and having it fail while
> the GC locker was active. Instrumentation showed that the initial mark
> pause was retried several million times.
>
> The solution to this issue were to not retry scheduling the initial
> mark pause for a humongous allocation if a marking cycle was already
> in progress, and check if the GC locker was active before retrying to
> schdule the initial mark pause.
>
> The other issue is that humongous object allocation would check
> whether a marking cycle was going to be placing free regions on to the
> secondary free list. If so then it would wait on the
> SecondaryFreeList_lock until the marking cycle had completed freeing
> the regions. Unfortunately the thread allocating the humongous object
> did not perform a safepoint check when locking and waiting on the
> SecondaryFreeList_lock. As a result a safepoint could be delayed
> indefinitely: if the SurrogateLockerThread was already blocked for the
> safepoint then the concurrent mark cycle may not be able to complete
> and so finish the freeing of the regions, which the allocating thread
> is waiting on.
>
> The solution for this issue is to perform the safepoint check when
> locking/waiting on the SecondaryFreeList_lock during humongous object
> allocation.
>
> Testing:
> * The hanging nightly tests (6) executing in a loop.
> * The GC test suite with G1 and with and without
> ExplicitGCInvokesConcurrent on several machines (including a 2-cpu).
> * jprt.
More information about the hotspot-gc-dev
mailing list