RFR (S) 8137099: OoME with G1 GC before doing a full GC

Thu Sep 24 15:13:11 UTC 2015

Hi,
we regularly see OoM-Errors with G1 in our stress tests.
We run the tests with the same heap size with ParallelGC and CMS without 
that problem.

The stress tests are based on real world application code with a lot of 
threads.

Scenario:
We have an application with a lot of threads and spend time in critical 
native sections.

1. An evacuation failure happens during a GC.
2. After clean-up work, the safepoint is left.
3. An other thread can't allocate and triggers a new incremental gc.
4. A thread, that can't allocate after an incremental GC, triggers a 
full GC. However, the GC doesn't start because an other thread
     started an incremental GC, the GC-locker is active or the GCLocker 
initiated GC has not yet been performed.
     If an incremental GC doesn't succeed due to the GC-locker, and if 
this happens more often than GCLockerRetryAllocationCount (=2) an OOME 
is thrown.

Without critical native code, we would try to trigger a full gc until we 
succeed. In this case there is just a performance issue, but not an OOME.

Despite to other GCs, the safepoint is left after an evacuation failure.

The proposed fix is to start a full GC before leaving the safepoint.

Bug:
https://bugs.openjdk.java.net/browse/JDK-8137099

Webrev:
http://cr.openjdk.java.net/~asiebenborn/8137099/webrev/

Thanks,
Axel