RFR (XXS): 7158682: G1: Handle leak when running nsk.sysdict tests
Bengt Rutisson
bengt.rutisson at oracle.com
Mon Apr 30 11:32:56 UTC 2012
Got some good pointer from Mikael Gerdin. There are actually two handles
created. One in instanceRefKlass::acquire_pending_list_lock() and one in
instanceRefKlass::release_and_notify_pending_list_lock(). So if we go
for HandleMark inside instanceRefKlass::acquire_pending_list_lock() we
also need to put one inside of
instanceRefKlass::release_and_notify_pending_list_lock().
The nice part about doing that is that we will fix this for CMS as well.
Currently CMS leaks two handles each remark phase.
Bengt
On 2012-04-30 11:13, Bengt Rutisson wrote:
>
> Hi John,
>
> Thanks for the detailed explanation. I understand your concern about
> worst case overhead for HandleMarks. However, when this code executes
> we are anyway in slow path allocation where we take locks, so I'm not
> so sure this overhead will be too painful. Also, it is not a very
> likely case.
>
> So, I would prefer to move the HandleMark closer to where the Handle
> is allocated.
>
> I started to look a bit closer to the call graph and came up with this
> call hierarchy:
>
> instanceRefKlass::acquire_pending_list_lock()
> VM_GC_Operation::acquire_pending_list_lock()
> VM_GC_Operation::doit_prologue()
> VM_G1OperationWithAllocRequest
> VM_G1CollectForAllocation
> G1CollectedHeap::mem_allocate() <----- HandleMark
> VM_G1IncCollectionPause::doit_prologue()
> G1CollectedHeap::do_collection_pause()
> G1CollectedHeap::attempt_allocation_slow() <----- HandleMark
> G1CollectedHeap::attempt_allocation_humongous() <-----
> HandleMark
> G1CollectedHeap::collect() <----- No HandleMark
>
> instanceRefKlass::acquire_pending_list_lock() is where we create the
> Handle. The entry points marked with "<----- HandleMark" is where you
> added HandleMarks. I think there might be one missing in
> G1CollectedHeap::collect(), right?
>
> Looking at this graph I would prefer to put the HandleMark even closer
> to where we create the Handle. In fact, why not put it inside
> instanceRefKlass::acquire_pending_list_lock()? Or at least put it in
> the VM_GC_Operation::doit_prologue(). After that the code forks and we
> run the risk that some path will be missing a HandleMark.
>
> I realize that these two places are shared code and not G1 specific,
> but having an extra HandleMark should be fine (apart from the worst
> case performance issue that you mentioned before). And who knows,
> maybe other collectors (CMS?) are already leaking handles?
>
> Bengt
>
>
>
> On 2012-04-27 19:14, John Cuthbertson wrote:
>> Hi Bengt,
>>
>> Thanks for the review and good question (and one I asked myself).
>>
>> I originally had the HandleMark's in the loop bodies but was
>> concerned about the possible worst case performance scenario where a
>> new handle area is allocated for handle - which is then freed at the
>> HandleMark destructor. When a handle area is freed - we interact with
>> the Chunk pools (access to which is serialized using a single global
>> mutex). I was able to force this to happen by allocating extra
>> handles before the loop.
>>
>> Without the fixes for 7147724 I would definitely have kept the
>> HandleMarks in the loop bodies. Prior to the fixes for 7147724 the
>> deadlock would have kept a thread spinning in
>> G1CollectedHeap::collect indefinitely, not stalling while the
>> GCLocker was active, and not bailing out when a marking cycle was
>> already in progress, caused a thread to spin in
>> G1CollectedHeap::collect() millions of times (I killed one test
>> program after seeing ~2m retry attempts). I was only able to
>> reproduce the OOM when the deadlock occurred (and even then it only
>> happened twice).
>>
>> With the fixes we now only spin if the prologue was unsuccessful and
>> it was due to another GC being scheduled (it's caused the GCLocker -
>> we stall the thread and then retry the GC) and I saw no more than a
>> few hundred iterations around the loop (s).
>>
>> I'm OK either way - I don't mind moving the HandleMarks. I'll leave
>> the choice up to you.
>>
>> JohnC
>>
>> On 04/27/12 06:19, Bengt Rutisson wrote:
>>>
>>> Hi John,
>>>
>>> I think this looks good, but I have a question for my understanding.
>>>
>>> You have placed the HandleMark outside loops where we eventually
>>> will store a handle to the pending list lock. I guess this will free
>>> up all allocated handles once we exit the loop.
>>>
>>> Are there any issues with putting the HandleMark inside the loop and
>>> free up handles one at a time? Is there still a risk of native OOME
>>> with your change if we get stuck in one of the allocation loops?
>>>
>>> Bengt
>>>
>>> On 2012-04-25 20:08, John Cuthbertson wrote:
>>>> Hi Everyone,
>>>>
>>>> Can I have a couple of volunteers to review this fairly small
>>>> change? The webrev can be found at:
>>>> http://cr.openjdk.java.net/~johnc/7158682/webrev/
>>>>
>>>> Summary:
>>>> This issue was mainly caused by the excessively high number of GC
>>>> retry attempts seen (and fixed) in 7147724. Each time we retry the
>>>> GC we allocate a handle to hold the pending list lock and there is
>>>> no HandleMark around the code that retries the GC. As a result of
>>>> the combination of these two factors, we ran out of C heap and
>>>> couldn't allocate a new HandleArea. The fixes for 7147724 address
>>>> the excessive retry attempts - but we can still retry the GC if the
>>>> prologue operations are not successful and we still don't free the
>>>> handles allocated in the prologues of the attempted GC operations.
>>>> The changes for this CR address this last issue.
>>>>
>>>> Many thanks to Mikael Gerdin for his initial diagnosis of the issue.
>>>>
>>>> Testing:
>>>> The failing test cases (with the changes for 7147724 removed and
>>>> instrumentation in the HandleMark class); GC test suite with
>>>> +ExplicitGCInvokesConcurrent; jprt
>>>>
>>>> Thanks,
>>>>
>>>> JohnC
>>>
>>
>
More information about the hotspot-gc-dev
mailing list