RFR (XXS): 7158682: G1: Handle leak when running nsk.sysdict tests

Mon Apr 30 11:32:56 UTC 2012

Got some good pointer from Mikael Gerdin. There are actually two handles 
created. One in instanceRefKlass::acquire_pending_list_lock() and one in 
instanceRefKlass::release_and_notify_pending_list_lock(). So if we go 
for HandleMark inside instanceRefKlass::acquire_pending_list_lock() we 
also need to put one inside of 
instanceRefKlass::release_and_notify_pending_list_lock().

The nice part about doing that is that we will fix this for CMS as well. 
Currently CMS leaks two handles each remark phase.

Bengt

On 2012-04-30 11:13, Bengt Rutisson wrote:
>
> Hi John,
>
> Thanks for the detailed explanation. I understand your concern about 
> worst case overhead for HandleMarks. However, when this code executes 
> we are anyway in slow path allocation where we take locks, so I'm not 
> so sure this overhead will be too painful. Also, it is not a very 
> likely case.
>
> So, I would prefer to move the HandleMark closer to where the Handle 
> is allocated.
>
> I started to look a bit closer to the call graph and came up with this 
> call hierarchy:
>
> instanceRefKlass::acquire_pending_list_lock()
>   VM_GC_Operation::acquire_pending_list_lock()
>     VM_GC_Operation::doit_prologue()
>       VM_G1OperationWithAllocRequest
>         VM_G1CollectForAllocation
>           G1CollectedHeap::mem_allocate() <----- HandleMark
>       VM_G1IncCollectionPause::doit_prologue()
>         G1CollectedHeap::do_collection_pause()
>           G1CollectedHeap::attempt_allocation_slow() <----- HandleMark
>           G1CollectedHeap::attempt_allocation_humongous() <----- 
> HandleMark
>         G1CollectedHeap::collect() <----- No HandleMark
>
> instanceRefKlass::acquire_pending_list_lock() is where we create the 
> Handle. The entry points marked with "<----- HandleMark" is where you 
> added HandleMarks. I think there might be one missing in 
> G1CollectedHeap::collect(), right?
>
> Looking at this graph I would prefer to put the HandleMark even closer 
> to where we create the Handle. In fact, why not put it inside 
> instanceRefKlass::acquire_pending_list_lock()? Or at least put it in 
> the VM_GC_Operation::doit_prologue(). After that the code forks and we 
> run the risk that some path will be missing a HandleMark.
>
> I realize that these two places are shared code and not G1 specific, 
> but having an extra HandleMark should be fine (apart from the worst 
> case performance issue that you mentioned before). And who knows, 
> maybe other collectors (CMS?) are already leaking handles?
>
> Bengt
>
>
>
> On 2012-04-27 19:14, John Cuthbertson wrote:
>> Hi Bengt,
>>
>> Thanks for the review and good question (and one I asked myself).
>>
>> I originally had the HandleMark's in the loop bodies but was 
>> concerned about the possible worst case performance scenario where a 
>> new handle area is allocated for handle - which is then freed at the 
>> HandleMark destructor. When a handle area is freed - we interact with 
>> the Chunk pools (access to which is serialized using a single global 
>> mutex). I was able to force this to happen by allocating extra 
>> handles before the loop.
>>
>> Without the fixes for 7147724 I would definitely have kept the 
>> HandleMarks in the loop bodies. Prior to the fixes for 7147724 the 
>> deadlock would have kept a thread spinning in 
>> G1CollectedHeap::collect indefinitely, not stalling while the 
>> GCLocker was active, and not bailing out when a marking cycle was 
>> already in progress, caused a thread to spin in 
>> G1CollectedHeap::collect() millions of times (I killed one test 
>> program after seeing ~2m retry attempts). I was only able to 
>> reproduce the OOM when the deadlock occurred (and even then it only 
>> happened twice).
>>
>> With the fixes we now only spin if the prologue was unsuccessful and 
>> it was due to another GC being scheduled (it's caused the GCLocker - 
>> we stall the thread and then retry the GC) and I saw no more than a 
>> few hundred iterations around the loop (s).
>>
>> I'm OK either way - I don't mind moving the HandleMarks. I'll leave 
>> the choice up to you.
>>
>> JohnC
>>
>> On 04/27/12 06:19, Bengt Rutisson wrote:
>>>
>>> Hi John,
>>>
>>> I think this looks good, but I have a question for my understanding.
>>>
>>> You have placed the HandleMark outside loops where we eventually 
>>> will store a handle to the pending list lock. I guess this will free 
>>> up all allocated handles once we exit the loop.
>>>
>>> Are there any issues with putting the HandleMark inside the loop and 
>>> free up handles one at a time? Is there still a risk of native OOME 
>>> with your change if we get stuck in one of the allocation loops?
>>>
>>> Bengt
>>>
>>> On 2012-04-25 20:08, John Cuthbertson wrote:
>>>> Hi Everyone,
>>>>
>>>> Can I have a couple of volunteers to review this fairly small 
>>>> change? The webrev can be found at: 
>>>> http://cr.openjdk.java.net/~johnc/7158682/webrev/
>>>>
>>>> Summary:
>>>> This issue was mainly caused by the excessively high number of GC 
>>>> retry attempts seen (and fixed) in 7147724. Each time we retry the 
>>>> GC we allocate a handle to hold the pending list lock and there is 
>>>> no HandleMark around the code that retries the GC. As a result of 
>>>> the combination of these two factors, we ran out of C heap and 
>>>> couldn't allocate a new HandleArea. The fixes for 7147724 address 
>>>> the excessive retry attempts - but we can still retry the GC if the 
>>>> prologue operations are not successful and we still don't free the 
>>>> handles allocated in the prologues of the attempted GC operations. 
>>>> The changes for this CR address this last issue.
>>>>
>>>> Many thanks to Mikael Gerdin for his initial diagnosis of the issue.
>>>>
>>>> Testing:
>>>> The failing test cases (with the changes for 7147724 removed and 
>>>> instrumentation in the HandleMark class); GC test suite with 
>>>> +ExplicitGCInvokesConcurrent; jprt
>>>>
>>>> Thanks,
>>>>
>>>> JohnC
>>>
>>
>