RFR(S): 7147724: G1: hang in SurrogateLockerThread::manipulatePLL
Bengt Rutisson
bengt.rutisson at oracle.com
Fri Mar 9 10:25:34 UTC 2012
Hi John,
I think I like this change, but either I am confused or you forgot to
remove the "bool no_safepoint_check" from:
G1CollectedHeap::humongous_obj_allocate_find_first()
G1CollectedHeap::humongous_obj_allocate()
Did you mean to remove it?
Just a style comment on this piece of code:
bool safepoint_check_flag = Mutex::_no_safepoint_check_flag;
if (Thread::current()->is_Java_thread()) {
safepoint_check_flag = !Mutex::_no_safepoint_check_flag;
}
I understand that we want to use Mutex::_no_safepoint_check_flag since
it is defined. But since there is no Mutex::_safepoint_check_flag or
similar I think we break the abstraction by doing "safepoint_check_flag
= !Mutex::_no_safepoint_check_flag". This kind of exposes that we are
dealing with a bool. So, given this I think I would prefer to change the
code to just:
safepoint_check_flag = Thread::current()->is_Java_thread();
Bengt
On 2012-03-08 19:13, John Cuthbertson wrote:
> Hi Everyone,
>
> A new version of theses changes, incorporating changes based upon the
> suggestions and questions from Bengt, can be found at:
> http://cr.openjdk.java.net/~johnc/7147724/webrev.1/
>
> Changes in this version include:
> * Identification of another path where a Java thread could lock and
> wait on the SecondaryFreeList_lock without checking for a safepoint.
> * The determination of whether to perform a safepoint check when
> locking/waitin on the SecondaryFreeList_lock is made based upon the
> type of thread - for Java threads a safepoint check is required.
>
> Thanks,
>
> JohnC
>
> On 03/05/12 10:37, John Cuthbertson wrote:
>> Hi Everyone,
>>
>> Can I have a couple of volunteers to review the changes for this CR?
>> The webrev can be found at:
>> http://cr.openjdk.java.net/~johnc/7147724/webrev.0/
>>
>> Summary:
>> There are a couple of issues, which look like hangs, that the changes
>> in this CR address.
>>
>> The first issue is that a thread, while attempting to allocate a
>> humongous object, would have the initial mark pause not succeed. It
>> would then continuously retry the pause (which would continously
>> fail). There are a couple of reasons for this. When several threads,
>> while attempting to allocate a humongous object, would determine that
>> a marking cycle was to be initiated - they would race to initiate the
>> initial mark pause. One thread would win and the losers would end up
>> failing the VM_G1IncCollectionPause::doit_prologue(). The losers
>> would then keep retrying to schedule the initial mark pause, and keep
>> failing in the prologue, while marking was in progress. Similarly the
>> initial mark pause itself could fail because the GC locker had just
>> become active. This also had the effect making the requesting thread
>> continuously retrying to schedule the pause and having it fail while
>> the GC locker was active. Instrumentation showed that the initial
>> mark pause was retried several million times.
>>
>> The solution to this issue were to not retry scheduling the initial
>> mark pause for a humongous allocation if a marking cycle was already
>> in progress, and check if the GC locker was active before retrying to
>> schdule the initial mark pause.
>>
>> The other issue is that humongous object allocation would check
>> whether a marking cycle was going to be placing free regions on to
>> the secondary free list. If so then it would wait on the
>> SecondaryFreeList_lock until the marking cycle had completed freeing
>> the regions. Unfortunately the thread allocating the humongous object
>> did not perform a safepoint check when locking and waiting on the
>> SecondaryFreeList_lock. As a result a safepoint could be delayed
>> indefinitely: if the SurrogateLockerThread was already blocked for
>> the safepoint then the concurrent mark cycle may not be able to
>> complete and so finish the freeing of the regions, which the
>> allocating thread is waiting on.
>>
>> The solution for this issue is to perform the safepoint check when
>> locking/waiting on the SecondaryFreeList_lock during humongous object
>> allocation.
>>
>> Testing:
>> * The hanging nightly tests (6) executing in a loop.
>> * The GC test suite with G1 and with and without
>> ExplicitGCInvokesConcurrent on several machines (including a 2-cpu).
>> * jprt.
>
More information about the hotspot-gc-dev
mailing list