RFR (S) 8137099: OoME with G1 GC before doing a full GC

Fri Oct 2 08:49:51 UTC 2015

Hi Mikael,

On 02.10.2015 09:47, Mikael Gerdin wrote:
> Hi Axel,
>
> On 2015-10-02 09:09, Axel Siebenborn wrote:
>> Hi,
>> On 28.09.2015 14:57, Siebenborn, Axel wrote:
>>>
>>> Hi,
>>> On 25.09.2015 11:51 Mikael Gerdin wrote:
>>>> Hi Axel,
>>>>
>>>> On 2015-09-24 17:13, Siebenborn, Axel wrote:
>>>>> Hi,
>>>>> we regularly see OoM-Errors with G1 in our stress tests.
>>>>> We run the tests with the same heap size with ParallelGC and CMS
>>>>> without
>>>>> that problem.
>>>>>
>>>>> The stress tests are based on real world application code with a 
>>>>> lot of
>>>>> threads.
>>>>>
>>>>> Scenario:
>>>>> We have an application with a lot of threads and spend time in 
>>>>> critical
>>>>> native sections.
>>>>>
>>>>> 1. An evacuation failure happens during a GC.
>>>>> 2. After clean-up work, the safepoint is left.
>>>>> 3. An other thread can't allocate and triggers a new incremental gc.
>>>>> 4. A thread, that can't allocate after an incremental GC, triggers a
>>>>> full GC. However, the GC doesn't start because an other thread
>>>>>      started an incremental GC, the GC-locker is active or the 
>>>>> GCLocker
>>>>> initiated GC has not yet been performed.
>>>>>      If an incremental GC doesn't succeed due to the GC-locker, 
>>>>> and if
>>>>> this happens more often than GCLockerRetryAllocationCount (=2) an 
>>>>> OOME
>>>>> is thrown.
>>>>>
>>>>> Without critical native code, we would try to trigger a full gc
>>>>> until we
>>>>> succeed. In this case there is just a performance issue, but not an
>>>>> OOME.
>>>>>
>>>>> Despite to other GCs, the safepoint is left after an evacuation
>>>>> failure.
>>>>
>>>> As I understand the history of it, the evacuation failure handling
>>>> code was written as a way to avoid a Full GC when an evacuation
>>>> failure occurred. The assumption was that the evacuation would have
>>>> freed enough memory before failing such that a Full GC could be 
>>>> avoided.
>>>>
>>>> A middle-of-the-road solution to your problem could be to check the
>>>> amount of free memory after the evacuation failure to see if a full
>>>> gc should be triggered or not.
>>>>
>>>> If you want to go even further you could do something like:
>>>>    _pause_succeeded =
>>>> g1h->do_collection_pause_at_safepoint(_target_pause_time_ms);
>>>>   if (_pause_succeeded && _word_size > 0) {
>>>>     bool full_succeeded;
>>>>     _result = g1h->satisfy_failed_allocation(_word_size,
>>>>     allocation_context(), &full_succeeded);
>>>>
>>>> This would handle the allocation both when the incremental pause gave
>>>> us enough memory and when it didn't and in that case G1 will perform
>>>> a full collection according to the standard policy.
>>>>
>>>> This would make the code more similar to VM_G1CollectForAllocation
>>>> (there is an issue with "expect_null_mutator_alloc_region" but that
>>>> seems to only be used for an old assert)
>>>>
>>>> What do you think?
>>>>
>>>> /Mikael
>>>>
>>>>>
>>>>> The proposed fix is to start a full GC before leaving the safepoint.
>>>>>
>>>>> Bug:
>>>>> https://bugs.openjdk.java.net/browse/JDK-8137099
>>>>>
>>>>> Webrev:
>>>>> http://cr.openjdk.java.net/~asiebenborn/8137099/webrev/
>>>>>
>>>>> Thanks,
>>>>> Axel
>>>>>
>>>>
>>> I ran some tests during the weekend without any problems and updated
>>> the webrev.
>>> http://cr.openjdk.java.net/~asiebenborn/8137099/webrev/
>>>
>>> Thanks,
>>> Axel
>> I discovered, that my change doesn't take into account, that collections
>> triggered by the GCLocker don't have an allocation request (_word_size
>> == 0).
>> However, in that case a full collection should happen, if the
>> incremental gc didn't free any memory.
>>
>> I created a new webrev:
>> http://cr.openjdk.java.net/~asiebenborn/8137099_0/webrev/
>
> Is this patch supposed to be combined with the one in the 
> 8137099/webrev directory?
No, this is a new patch and should be applied alone. Sorry for the 
confusion.
>
> I'm planning on running some internal testing on this over the weekend 
> as well.
>
> /Mikael
>
>>
>> Thanks,
>> Axel
>
Thanks,
Axel