RFR(S) 8245925 G1 allocates EDEN region after CDS has executed GC

Fri Jun 5 21:38:41 UTC 2020

Hi Ioi,

Thanks for the updated webrev.

To avoid the assert, could you call GCLocker::stall_until_clear in
MetaspaceShared::preload_and_dump() before grabbing the 'Heap_lock'
and VMThread::execute(&op), like following. It is the best to check
with Thomas and other GC experts about the interaction between
'Heap_lock' and 'GCLocker'.

        if (HeapShared::is_heap_object_archiving_allowed()) {
          GCLocker::stall_until_clear();
        }

There is also a compiler error caused by the extra 'THREAD' arg for
MutexLocker. Please fix:

mutexLocker.hpp:205:22: note:   no known conversion for argument 1
from ‘Thread*’ to ‘Mutex*’

  205 |   MutexLocker(Mutex* mutex, Thread* thread,
Mutex::SafepointCheckFlag flag = Mutex::_safepoint_check_flag) :

      |               ~~~~~~~^~~~~

mutexLocker.hpp:191:3: note: candidate:
‘MutexLocker::MutexLocker(Mutex*, Mutex::SafepointCheckFlag)’

  191 |   MutexLocker(Mutex* mutex, Mutex::SafepointCheckFlag flag =
Mutex::_safepoint_check_flag) :

   ... (rest of output omitted)

Best,
Jiangli

On Thu, Jun 4, 2020 at 9:37 PM Ioi Lam <ioi.lam at oracle.com> wrote:
>
> Hi Jiangli,
>
> Updated webrev is here:
>
> http://cr.openjdk.java.net/~iklam/jdk15/8245925-g1-eden-after-cds-gc.v03/
>
> The only difference with the previous version is this part:
>
> 1975 MutexLocker ml(THREAD, HeapShared::is_heap_object_archiving_allowed() ?
> 1976                Heap_lock : NULL);     // needed for
> collect_as_vm_thread
>
> Thanks
> - Ioi
>
>
> On 6/4/20 5:23 PM, Jiangli Zhou wrote:
> > Ioi, do you have a new webrev?
> >
> > Best,
> > Jiangli
> >
> > On Thu, Jun 4, 2020 at 4:54 PM Ioi Lam <ioi.lam at oracle.com> wrote:
> >>
> >>
> >> On 6/4/20 12:04 PM, Jiangli Zhou wrote:
> >>> On Wed, Jun 3, 2020 at 10:56 PM Ioi Lam <ioi.lam at oracle.com> wrote:
> >>>>> On 5/29/20 9:40 PM, Yumin Qi wrote:
> >>>>>> HI, Ioi
> >>>>>>
> >>>>>>     If the allocation of EDEN happens between GC and dump, should we
> >>>>>> put the GC action in VM_PopulateDumpSharedSpace? This way, at
> >>>>>> safepoint there should no allocation happens. The stack trace showed
> >>>>>> it happened with a Java Thread, which should be blocked at safepoint.
> >>>>>>
> >>>>> Hi Yumin,
> >>>>>
> >>>>> I think GCs cannot be executed inside a safepoint, because some parts
> >>>>> of GC need to execute in a safepoint, so they will be blocked until
> >>>>> VM_PopulateDumpSharedSpace::doit has returned.
> >>>>>
> >>>>> Anyway, as I mentioned in my reply to Jiangli, there's a better way to
> >>>>> fix this, so I will withdraw the current patch.
> >>>>>
> >>>> Hi Yumin,
> >>>>
> >>>> Actually, I changed my mind again, and implemented your suggestion :-)
> >>>>
> >>>> There's actually a way to invoke GC inside a safepoint (it's used by
> >>>> "jcmd gc.heap_dump", for example). So I changed the CDS code to do the
> >>>> same thing. It's a much simpler change and does what I want -- no other
> >>>> thread will be able to make any heap allocation after the GC has
> >>>> completed, so no EDEN region will be allocated:
> >>>>
> >>>> http://cr.openjdk.java.net/~iklam/jdk15/8245925-g1-eden-after-cds-gc.v02/
> >>> Going with the simple approach for the short term sounds ok.
> >>>
> >>> 1612 void VM_PopulateDumpSharedSpace::doit() {
> >>> ...
> >>> 1615     if (GCLocker::is_active()) {
> >>> 1616       // This should rarely happen during -Xshare:dump, if at
> >>> all, but just to be safe.
> >>> 1617       log_debug(cds)("GCLocker::is_active() ... try again");
> >>> 1618       return;
> >>> 1619     }
> >>>
> >>> MetaspaceShared::preload_and_dump()
> >>> ...
> >>> 1945     while (true) {
> >>> 1946       {
> >>> 1947         MutexLocker ml(THREAD, Heap_lock);
> >>> 1948         VMThread::execute(&op);
> >>> 1949       }
> >>> 1950       // If dumping has finished, the VM would have exited. The
> >>> only reason to
> >>> 1951       // come back here is to wait for the GCLocker.
> >>> 1952       assert(HeapShared::is_heap_object_archiving_allowed(), "sanity");
> >>> 1953       os::naked_short_sleep(1);
> >>> 1954     }
> >>> 1955   }
> >>>
> >>>
> >>> Instead of doing the while/retry, calling
> >>> GCLocker::stall_until_clear() in MetaspaceShared::preload_and_dump
> >>> before VM_PopulateDumpSharedSpace probably is much cleaner?
> >> Hi Jiangli, I tried your suggestion, but GCLocker::stall_until_clear()
> >> cannot be called in the VM thread:
> >>
> >> #  assert(thread->is_Java_thread()) failed: just checking
> >>
> >>> Please also add a comment in MetaspaceShared::preload_and_dump to
> >>> explain that Universe::heap()->collect_as_vm_thread expects that
> >>> Heap_lock is already held by the thread, and that's the reason for the
> >>> call to MutexLocker ml(THREAD, Heap_lock).
> >> I changed the code to be like this:
> >>
> >>       MutexLocker ml(THREAD, HeapShared::is_heap_object_archiving_allowed() ?
> >>                      Heap_lock : NULL);     // needed for
> >> collect_as_vm_thread
> >>
> >>> Do you also need to call
> >>> Universe::heap()->soft_ref_policy()->set_should_clear_all_soft_refs(true)
> >>> before GC?
> >> There's no need:
> >>
> >> void CollectedHeap::collect_as_vm_thread(GCCause::Cause cause) {
> >>        ...
> >>       case GCCause::_archive_time_gc:
> >>       case GCCause::_metadata_GC_clear_soft_refs: {
> >>         HandleMark hm;
> >>         do_full_collection(true);         // do clear all soft refs
> >>
> >> Thanks
> >> - Ioi
> >>
> >>> Best,
> >>> Jiangli
> >>>
> >>>> The check for "if (GCLocker::is_active())" should almost always be
> >>>> false. I left it there just for safety:
> >>>>
> >>>> During -Xshare:dump, we execute Java code to build the module graph,
> >>>> load classes, etc. So theoretically someone could try to parallelize
> >>>> some of that Java code in the future. Theoretically when CDS has entered
> >>>> the safepoint, another thread could be in the middle a JNI method that
> >>>> has held the GCLock.
> >>>>
> >>>> Thanks
> >>>> - Ioi
> >>>>
> >>>>>> On 5/29/20 7:29 PM, Ioi Lam wrote:
> >>>>>>> https://bugs.openjdk.java.net/browse/JDK-8245925
> >>>>>>> http://cr.openjdk.java.net/~iklam/jdk15/8245925-g1-eden-after-cds-gc.v01/
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> Summary:
> >>>>>>>
> >>>>>>> CDS supports archived heap objects only for G1. During -Xshare:dump,
> >>>>>>> CDS executes a full GC so that G1 will compact the heap regions,
> >>>>>>> leaving
> >>>>>>> maximum contiguous free space at the top of the heap. Then, the
> >>>>>>> archived
> >>>>>>> heap regions are allocated from the top of the heap.
> >>>>>>>
> >>>>>>> Under some circumstances, java.lang.ref.Cleaners will execute
> >>>>>>> after the GC has completed. The cleaners may allocate or
> >>>>>>> synchronized, which
> >>>>>>> will cause G1 to allocate an EDEN region at the top of the heap.
> >>>>>>>
> >>>>>>> The fix is simple -- after CDS has entered a safepoint, if EDEN
> >>>>>>> regions exist,
> >>>>>>> exit the safepoint, run GC, and try again. Eventually all the
> >>>>>>> cleaners will
> >>>>>>> be executed and no more allocation can happen.
> >>>>>>>
> >>>>>>> For safety, I limit the retry count to 30 (or about total 9 seconds).
> >>>>>>>
> >>>>>>> Thanks
> >>>>>>> - Ioi
> >>>>>>>
> >>>>>>>
> >>>>>>> <https://bugs.openjdk.java.net/browse/JDK-8245925>
>