RFR(S) 8245925 G1 allocates EDEN region after CDS has executed GC

Fri Jun 5 00:23:51 UTC 2020

Ioi, do you have a new webrev?

Best,
Jiangli

On Thu, Jun 4, 2020 at 4:54 PM Ioi Lam <ioi.lam at oracle.com> wrote:
>
>
>
> On 6/4/20 12:04 PM, Jiangli Zhou wrote:
> > On Wed, Jun 3, 2020 at 10:56 PM Ioi Lam <ioi.lam at oracle.com> wrote:
> >>
> >>> On 5/29/20 9:40 PM, Yumin Qi wrote:
> >>>> HI, Ioi
> >>>>
> >>>>    If the allocation of EDEN happens between GC and dump, should we
> >>>> put the GC action in VM_PopulateDumpSharedSpace? This way, at
> >>>> safepoint there should no allocation happens. The stack trace showed
> >>>> it happened with a Java Thread, which should be blocked at safepoint.
> >>>>
> >>> Hi Yumin,
> >>>
> >>> I think GCs cannot be executed inside a safepoint, because some parts
> >>> of GC need to execute in a safepoint, so they will be blocked until
> >>> VM_PopulateDumpSharedSpace::doit has returned.
> >>>
> >>> Anyway, as I mentioned in my reply to Jiangli, there's a better way to
> >>> fix this, so I will withdraw the current patch.
> >>>
> >> Hi Yumin,
> >>
> >> Actually, I changed my mind again, and implemented your suggestion :-)
> >>
> >> There's actually a way to invoke GC inside a safepoint (it's used by
> >> "jcmd gc.heap_dump", for example). So I changed the CDS code to do the
> >> same thing. It's a much simpler change and does what I want -- no other
> >> thread will be able to make any heap allocation after the GC has
> >> completed, so no EDEN region will be allocated:
> >>
> >> http://cr.openjdk.java.net/~iklam/jdk15/8245925-g1-eden-after-cds-gc.v02/
> > Going with the simple approach for the short term sounds ok.
> >
> > 1612 void VM_PopulateDumpSharedSpace::doit() {
> > ...
> > 1615     if (GCLocker::is_active()) {
> > 1616       // This should rarely happen during -Xshare:dump, if at
> > all, but just to be safe.
> > 1617       log_debug(cds)("GCLocker::is_active() ... try again");
> > 1618       return;
> > 1619     }
> >
> > MetaspaceShared::preload_and_dump()
> > ...
> > 1945     while (true) {
> > 1946       {
> > 1947         MutexLocker ml(THREAD, Heap_lock);
> > 1948         VMThread::execute(&op);
> > 1949       }
> > 1950       // If dumping has finished, the VM would have exited. The
> > only reason to
> > 1951       // come back here is to wait for the GCLocker.
> > 1952       assert(HeapShared::is_heap_object_archiving_allowed(), "sanity");
> > 1953       os::naked_short_sleep(1);
> > 1954     }
> > 1955   }
> >
> >
> > Instead of doing the while/retry, calling
> > GCLocker::stall_until_clear() in MetaspaceShared::preload_and_dump
> > before VM_PopulateDumpSharedSpace probably is much cleaner?
>
> Hi Jiangli, I tried your suggestion, but GCLocker::stall_until_clear()
> cannot be called in the VM thread:
>
> #  assert(thread->is_Java_thread()) failed: just checking
>
> > Please also add a comment in MetaspaceShared::preload_and_dump to
> > explain that Universe::heap()->collect_as_vm_thread expects that
> > Heap_lock is already held by the thread, and that's the reason for the
> > call to MutexLocker ml(THREAD, Heap_lock).
>
> I changed the code to be like this:
>
>      MutexLocker ml(THREAD, HeapShared::is_heap_object_archiving_allowed() ?
>                     Heap_lock : NULL);     // needed for
> collect_as_vm_thread
>
> > Do you also need to call
> > Universe::heap()->soft_ref_policy()->set_should_clear_all_soft_refs(true)
> > before GC?
>
> There's no need:
>
> void CollectedHeap::collect_as_vm_thread(GCCause::Cause cause) {
>       ...
>      case GCCause::_archive_time_gc:
>      case GCCause::_metadata_GC_clear_soft_refs: {
>        HandleMark hm;
>        do_full_collection(true);         // do clear all soft refs
>
> Thanks
> - Ioi
>
> > Best,
> > Jiangli
> >
> >> The check for "if (GCLocker::is_active())" should almost always be
> >> false. I left it there just for safety:
> >>
> >> During -Xshare:dump, we execute Java code to build the module graph,
> >> load classes, etc. So theoretically someone could try to parallelize
> >> some of that Java code in the future. Theoretically when CDS has entered
> >> the safepoint, another thread could be in the middle a JNI method that
> >> has held the GCLock.
> >>
> >> Thanks
> >> - Ioi
> >>
> >>>> On 5/29/20 7:29 PM, Ioi Lam wrote:
> >>>>> https://bugs.openjdk.java.net/browse/JDK-8245925
> >>>>> http://cr.openjdk.java.net/~iklam/jdk15/8245925-g1-eden-after-cds-gc.v01/
> >>>>>
> >>>>>
> >>>>>
> >>>>> Summary:
> >>>>>
> >>>>> CDS supports archived heap objects only for G1. During -Xshare:dump,
> >>>>> CDS executes a full GC so that G1 will compact the heap regions,
> >>>>> leaving
> >>>>> maximum contiguous free space at the top of the heap. Then, the
> >>>>> archived
> >>>>> heap regions are allocated from the top of the heap.
> >>>>>
> >>>>> Under some circumstances, java.lang.ref.Cleaners will execute
> >>>>> after the GC has completed. The cleaners may allocate or
> >>>>> synchronized, which
> >>>>> will cause G1 to allocate an EDEN region at the top of the heap.
> >>>>>
> >>>>> The fix is simple -- after CDS has entered a safepoint, if EDEN
> >>>>> regions exist,
> >>>>> exit the safepoint, run GC, and try again. Eventually all the
> >>>>> cleaners will
> >>>>> be executed and no more allocation can happen.
> >>>>>
> >>>>> For safety, I limit the retry count to 30 (or about total 9 seconds).
> >>>>>
> >>>>> Thanks
> >>>>> - Ioi
> >>>>>
> >>>>>
> >>>>> <https://bugs.openjdk.java.net/browse/JDK-8245925>
>