RFR (XXS): 7158682: G1: Handle leak when running nsk.sysdict tests
John Cuthbertson
john.cuthbertson at oracle.com
Fri Apr 27 17:14:34 UTC 2012
Hi Bengt,
Thanks for the review and good question (and one I asked myself).
I originally had the HandleMark's in the loop bodies but was concerned
about the possible worst case performance scenario where a new handle
area is allocated for handle - which is then freed at the HandleMark
destructor. When a handle area is freed - we interact with the Chunk
pools (access to which is serialized using a single global mutex). I was
able to force this to happen by allocating extra handles before the loop.
Without the fixes for 7147724 I would definitely have kept the
HandleMarks in the loop bodies. Prior to the fixes for 7147724 the
deadlock would have kept a thread spinning in G1CollectedHeap::collect
indefinitely, not stalling while the GCLocker was active, and not
bailing out when a marking cycle was already in progress, caused a
thread to spin in G1CollectedHeap::collect() millions of times (I killed
one test program after seeing ~2m retry attempts). I was only able to
reproduce the OOM when the deadlock occurred (and even then it only
happened twice).
With the fixes we now only spin if the prologue was unsuccessful and it
was due to another GC being scheduled (it's caused the GCLocker - we
stall the thread and then retry the GC) and I saw no more than a few
hundred iterations around the loop (s).
I'm OK either way - I don't mind moving the HandleMarks. I'll leave the
choice up to you.
JohnC
On 04/27/12 06:19, Bengt Rutisson wrote:
>
> Hi John,
>
> I think this looks good, but I have a question for my understanding.
>
> You have placed the HandleMark outside loops where we eventually will
> store a handle to the pending list lock. I guess this will free up all
> allocated handles once we exit the loop.
>
> Are there any issues with putting the HandleMark inside the loop and
> free up handles one at a time? Is there still a risk of native OOME
> with your change if we get stuck in one of the allocation loops?
>
> Bengt
>
> On 2012-04-25 20:08, John Cuthbertson wrote:
>> Hi Everyone,
>>
>> Can I have a couple of volunteers to review this fairly small change?
>> The webrev can be found at:
>> http://cr.openjdk.java.net/~johnc/7158682/webrev/
>>
>> Summary:
>> This issue was mainly caused by the excessively high number of GC
>> retry attempts seen (and fixed) in 7147724. Each time we retry the GC
>> we allocate a handle to hold the pending list lock and there is no
>> HandleMark around the code that retries the GC. As a result of the
>> combination of these two factors, we ran out of C heap and couldn't
>> allocate a new HandleArea. The fixes for 7147724 address the
>> excessive retry attempts - but we can still retry the GC if the
>> prologue operations are not successful and we still don't free the
>> handles allocated in the prologues of the attempted GC operations.
>> The changes for this CR address this last issue.
>>
>> Many thanks to Mikael Gerdin for his initial diagnosis of the issue.
>>
>> Testing:
>> The failing test cases (with the changes for 7147724 removed and
>> instrumentation in the HandleMark class); GC test suite with
>> +ExplicitGCInvokesConcurrent; jprt
>>
>> Thanks,
>>
>> JohnC
>
More information about the hotspot-gc-dev
mailing list