RFR(S): 7147724: G1: hang in SurrogateLockerThread::manipulatePLL

John Cuthbertson john.cuthbertson at oracle.com
Thu Mar 8 18:13:05 UTC 2012


Hi Everyone,

A new version of theses changes, incorporating changes based upon the 
suggestions and questions from Bengt, can be found at: 
http://cr.openjdk.java.net/~johnc/7147724/webrev.1/

Changes in this version include:
* Identification of another path where a Java thread could lock and wait 
on the SecondaryFreeList_lock without checking for a safepoint.
* The determination of whether to perform a safepoint check when 
locking/waitin on the SecondaryFreeList_lock is made based upon the type 
of thread - for Java threads a safepoint check is required.

Thanks,

JohnC

On 03/05/12 10:37, John Cuthbertson wrote:
> Hi Everyone,
>
> Can I have a couple of volunteers to review the changes for this CR? 
> The webrev can be found at: 
> http://cr.openjdk.java.net/~johnc/7147724/webrev.0/
>
> Summary:
> There are a couple of issues, which look like hangs, that the changes 
> in this CR address.
>
> The first issue is that a thread, while attempting to allocate a 
> humongous object, would have the initial mark pause not succeed. It 
> would then continuously retry the pause (which would continously 
> fail). There are a couple of reasons for this. When several threads, 
> while attempting to allocate a humongous object, would determine that 
> a marking cycle was to be initiated - they would race to initiate the 
> initial mark pause. One thread would win and the losers would end up 
> failing the VM_G1IncCollectionPause::doit_prologue(). The losers would 
> then keep retrying to schedule the initial mark pause, and keep 
> failing in the prologue, while marking was in progress. Similarly the 
> initial mark pause itself could fail because the GC locker had just 
> become active. This also had the effect making the requesting thread 
> continuously retrying to schedule the pause and having it fail while 
> the GC locker was active. Instrumentation showed that the initial mark 
> pause was retried several million times.
>
> The solution to this issue were to not retry scheduling the initial 
> mark pause for a humongous allocation if a marking cycle was already 
> in progress, and check if the GC locker was active before retrying to 
> schdule the initial mark pause.
>
> The other issue is that humongous object allocation would check 
> whether a marking cycle was going to be placing free regions on to the 
> secondary free list. If so then it would wait on the 
> SecondaryFreeList_lock until the marking cycle had completed freeing 
> the regions. Unfortunately the thread allocating the humongous object 
> did not perform a safepoint check when locking and waiting on the 
> SecondaryFreeList_lock. As a result a safepoint could be delayed 
> indefinitely: if the SurrogateLockerThread was already blocked for the 
> safepoint then the concurrent mark cycle may not be able to complete 
> and so finish the freeing of the regions, which the allocating thread 
> is waiting on.
>
> The solution for this issue is to perform the safepoint check when 
> locking/waiting on the SecondaryFreeList_lock during humongous object 
> allocation.
>
> Testing:
> * The hanging nightly tests (6) executing in a loop.
> * The GC test suite with G1 and with and without 
> ExplicitGCInvokesConcurrent on several machines (including a 2-cpu).
> * jprt.




More information about the hotspot-gc-dev mailing list