RFR (XXS): 7158682: G1: Handle leak when running nsk.sysdict tests

Fri Apr 27 17:14:34 UTC 2012

Hi Bengt,

Thanks for the review and good question (and one I asked myself).

I originally had the HandleMark's in the loop bodies but was concerned 
about the possible worst case performance scenario where a new handle 
area is allocated for handle - which is then freed at the HandleMark 
destructor. When a handle area is freed - we interact with the Chunk 
pools (access to which is serialized using a single global mutex). I was 
able to force this to happen by allocating extra handles before the loop.

Without the fixes for 7147724 I would definitely have kept the 
HandleMarks in the loop bodies. Prior to the fixes for 7147724 the 
deadlock would have kept a thread spinning in G1CollectedHeap::collect 
indefinitely, not stalling while the GCLocker was active, and not 
bailing out when a marking cycle was already in progress, caused a 
thread to spin in G1CollectedHeap::collect() millions of times (I killed 
one test program after seeing ~2m retry attempts). I was only able to 
reproduce the OOM when the deadlock occurred (and even then it only 
happened twice).

With the fixes we now only spin if the prologue was unsuccessful and it 
was due to another GC being scheduled (it's caused the GCLocker - we 
stall the thread and then retry the GC) and I saw no more than a few 
hundred iterations around the loop (s).

I'm OK either way - I don't mind moving the HandleMarks. I'll leave the 
choice up to you.

JohnC

On 04/27/12 06:19, Bengt Rutisson wrote:
>
> Hi John,
>
> I think this looks good, but I have a question for my understanding.
>
> You have placed the HandleMark outside loops where we eventually will 
> store a handle to the pending list lock. I guess this will free up all 
> allocated handles once we exit the loop.
>
> Are there any issues with putting the HandleMark inside the loop and 
> free up handles one at a time? Is there still a risk of native OOME 
> with your change if we get stuck in one of the allocation loops?
>
> Bengt
>
> On 2012-04-25 20:08, John Cuthbertson wrote:
>> Hi Everyone,
>>
>> Can I have a couple of volunteers to review this fairly small change? 
>> The webrev can be found at: 
>> http://cr.openjdk.java.net/~johnc/7158682/webrev/
>>
>> Summary:
>> This issue was mainly caused by the excessively high number of GC 
>> retry attempts seen (and fixed) in 7147724. Each time we retry the GC 
>> we allocate a handle to hold the pending list lock and there is no 
>> HandleMark around the code that retries the GC. As a result of the 
>> combination of these two factors, we ran out of C heap and couldn't 
>> allocate a new HandleArea. The fixes for 7147724 address the 
>> excessive retry attempts - but we can still retry the GC if the 
>> prologue operations are not successful and we still don't free the 
>> handles allocated in the prologues of the attempted GC operations. 
>> The changes for this CR address this last issue.
>>
>> Many thanks to Mikael Gerdin for his initial diagnosis of the issue.
>>
>> Testing:
>> The failing test cases (with the changes for 7147724 removed and 
>> instrumentation in the HandleMark class); GC test suite with 
>> +ExplicitGCInvokesConcurrent; jprt
>>
>> Thanks,
>>
>> JohnC
>