RFR (XXS): 7158682: G1: Handle leak when running nsk.sysdict tests
Bengt Rutisson
bengt.rutisson at oracle.com
Mon Apr 30 09:13:18 UTC 2012
Hi John,
Thanks for the detailed explanation. I understand your concern about
worst case overhead for HandleMarks. However, when this code executes we
are anyway in slow path allocation where we take locks, so I'm not so
sure this overhead will be too painful. Also, it is not a very likely case.
So, I would prefer to move the HandleMark closer to where the Handle is
allocated.
I started to look a bit closer to the call graph and came up with this
call hierarchy:
instanceRefKlass::acquire_pending_list_lock()
VM_GC_Operation::acquire_pending_list_lock()
VM_GC_Operation::doit_prologue()
VM_G1OperationWithAllocRequest
VM_G1CollectForAllocation
G1CollectedHeap::mem_allocate() <----- HandleMark
VM_G1IncCollectionPause::doit_prologue()
G1CollectedHeap::do_collection_pause()
G1CollectedHeap::attempt_allocation_slow() <----- HandleMark
G1CollectedHeap::attempt_allocation_humongous() <----- HandleMark
G1CollectedHeap::collect() <----- No HandleMark
instanceRefKlass::acquire_pending_list_lock() is where we create the
Handle. The entry points marked with "<----- HandleMark" is where you
added HandleMarks. I think there might be one missing in
G1CollectedHeap::collect(), right?
Looking at this graph I would prefer to put the HandleMark even closer
to where we create the Handle. In fact, why not put it inside
instanceRefKlass::acquire_pending_list_lock()? Or at least put it in the
VM_GC_Operation::doit_prologue(). After that the code forks and we run
the risk that some path will be missing a HandleMark.
I realize that these two places are shared code and not G1 specific, but
having an extra HandleMark should be fine (apart from the worst case
performance issue that you mentioned before). And who knows, maybe other
collectors (CMS?) are already leaking handles?
Bengt
On 2012-04-27 19:14, John Cuthbertson wrote:
> Hi Bengt,
>
> Thanks for the review and good question (and one I asked myself).
>
> I originally had the HandleMark's in the loop bodies but was concerned
> about the possible worst case performance scenario where a new handle
> area is allocated for handle - which is then freed at the HandleMark
> destructor. When a handle area is freed - we interact with the Chunk
> pools (access to which is serialized using a single global mutex). I
> was able to force this to happen by allocating extra handles before
> the loop.
>
> Without the fixes for 7147724 I would definitely have kept the
> HandleMarks in the loop bodies. Prior to the fixes for 7147724 the
> deadlock would have kept a thread spinning in G1CollectedHeap::collect
> indefinitely, not stalling while the GCLocker was active, and not
> bailing out when a marking cycle was already in progress, caused a
> thread to spin in G1CollectedHeap::collect() millions of times (I
> killed one test program after seeing ~2m retry attempts). I was only
> able to reproduce the OOM when the deadlock occurred (and even then it
> only happened twice).
>
> With the fixes we now only spin if the prologue was unsuccessful and
> it was due to another GC being scheduled (it's caused the GCLocker -
> we stall the thread and then retry the GC) and I saw no more than a
> few hundred iterations around the loop (s).
>
> I'm OK either way - I don't mind moving the HandleMarks. I'll leave
> the choice up to you.
>
> JohnC
>
> On 04/27/12 06:19, Bengt Rutisson wrote:
>>
>> Hi John,
>>
>> I think this looks good, but I have a question for my understanding.
>>
>> You have placed the HandleMark outside loops where we eventually will
>> store a handle to the pending list lock. I guess this will free up
>> all allocated handles once we exit the loop.
>>
>> Are there any issues with putting the HandleMark inside the loop and
>> free up handles one at a time? Is there still a risk of native OOME
>> with your change if we get stuck in one of the allocation loops?
>>
>> Bengt
>>
>> On 2012-04-25 20:08, John Cuthbertson wrote:
>>> Hi Everyone,
>>>
>>> Can I have a couple of volunteers to review this fairly small
>>> change? The webrev can be found at:
>>> http://cr.openjdk.java.net/~johnc/7158682/webrev/
>>>
>>> Summary:
>>> This issue was mainly caused by the excessively high number of GC
>>> retry attempts seen (and fixed) in 7147724. Each time we retry the
>>> GC we allocate a handle to hold the pending list lock and there is
>>> no HandleMark around the code that retries the GC. As a result of
>>> the combination of these two factors, we ran out of C heap and
>>> couldn't allocate a new HandleArea. The fixes for 7147724 address
>>> the excessive retry attempts - but we can still retry the GC if the
>>> prologue operations are not successful and we still don't free the
>>> handles allocated in the prologues of the attempted GC operations.
>>> The changes for this CR address this last issue.
>>>
>>> Many thanks to Mikael Gerdin for his initial diagnosis of the issue.
>>>
>>> Testing:
>>> The failing test cases (with the changes for 7147724 removed and
>>> instrumentation in the HandleMark class); GC test suite with
>>> +ExplicitGCInvokesConcurrent; jprt
>>>
>>> Thanks,
>>>
>>> JohnC
>>
>
More information about the hotspot-gc-dev
mailing list