RFR(S) 8245925 G1 allocates EDEN region after CDS has executed GC

Thu Jun 4 19:04:45 UTC 2020

On Wed, Jun 3, 2020 at 10:56 PM Ioi Lam <ioi.lam at oracle.com> wrote:
>
>
> > On 5/29/20 9:40 PM, Yumin Qi wrote:
> >> HI, Ioi
> >>
> >>   If the allocation of EDEN happens between GC and dump, should we
> >> put the GC action in VM_PopulateDumpSharedSpace? This way, at
> >> safepoint there should no allocation happens. The stack trace showed
> >> it happened with a Java Thread, which should be blocked at safepoint.
> >>
> >
> > Hi Yumin,
> >
> > I think GCs cannot be executed inside a safepoint, because some parts
> > of GC need to execute in a safepoint, so they will be blocked until
> > VM_PopulateDumpSharedSpace::doit has returned.
> >
> > Anyway, as I mentioned in my reply to Jiangli, there's a better way to
> > fix this, so I will withdraw the current patch.
> >
>
> Hi Yumin,
>
> Actually, I changed my mind again, and implemented your suggestion :-)
>
> There's actually a way to invoke GC inside a safepoint (it's used by
> "jcmd gc.heap_dump", for example). So I changed the CDS code to do the
> same thing. It's a much simpler change and does what I want -- no other
> thread will be able to make any heap allocation after the GC has
> completed, so no EDEN region will be allocated:
>
> http://cr.openjdk.java.net/~iklam/jdk15/8245925-g1-eden-after-cds-gc.v02/

Going with the simple approach for the short term sounds ok.

1612 void VM_PopulateDumpSharedSpace::doit() {
...
1615     if (GCLocker::is_active()) {
1616       // This should rarely happen during -Xshare:dump, if at
all, but just to be safe.
1617       log_debug(cds)("GCLocker::is_active() ... try again");
1618       return;
1619     }

MetaspaceShared::preload_and_dump()
...
1945     while (true) {
1946       {
1947         MutexLocker ml(THREAD, Heap_lock);
1948         VMThread::execute(&op);
1949       }
1950       // If dumping has finished, the VM would have exited. The
only reason to
1951       // come back here is to wait for the GCLocker.
1952       assert(HeapShared::is_heap_object_archiving_allowed(), "sanity");
1953       os::naked_short_sleep(1);
1954     }
1955   }

Instead of doing the while/retry, calling
GCLocker::stall_until_clear() in MetaspaceShared::preload_and_dump
before VM_PopulateDumpSharedSpace probably is much cleaner?

Please also add a comment in MetaspaceShared::preload_and_dump to
explain that Universe::heap()->collect_as_vm_thread expects that
Heap_lock is already held by the thread, and that's the reason for the
call to MutexLocker ml(THREAD, Heap_lock).

Do you also need to call
Universe::heap()->soft_ref_policy()->set_should_clear_all_soft_refs(true)
before GC?

Best,
Jiangli

>
> The check for "if (GCLocker::is_active())" should almost always be
> false. I left it there just for safety:
>
> During -Xshare:dump, we execute Java code to build the module graph,
> load classes, etc. So theoretically someone could try to parallelize
> some of that Java code in the future. Theoretically when CDS has entered
> the safepoint, another thread could be in the middle a JNI method that
> has held the GCLock.
>
> Thanks
> - Ioi
>
> >
> >>
> >> On 5/29/20 7:29 PM, Ioi Lam wrote:
> >>> https://bugs.openjdk.java.net/browse/JDK-8245925
> >>> http://cr.openjdk.java.net/~iklam/jdk15/8245925-g1-eden-after-cds-gc.v01/
> >>>
> >>>
> >>>
> >>> Summary:
> >>>
> >>> CDS supports archived heap objects only for G1. During -Xshare:dump,
> >>> CDS executes a full GC so that G1 will compact the heap regions,
> >>> leaving
> >>> maximum contiguous free space at the top of the heap. Then, the
> >>> archived
> >>> heap regions are allocated from the top of the heap.
> >>>
> >>> Under some circumstances, java.lang.ref.Cleaners will execute
> >>> after the GC has completed. The cleaners may allocate or
> >>> synchronized, which
> >>> will cause G1 to allocate an EDEN region at the top of the heap.
> >>>
> >>> The fix is simple -- after CDS has entered a safepoint, if EDEN
> >>> regions exist,
> >>> exit the safepoint, run GC, and try again. Eventually all the
> >>> cleaners will
> >>> be executed and no more allocation can happen.
> >>>
> >>> For safety, I limit the retry count to 30 (or about total 9 seconds).
> >>>
> >>> Thanks
> >>> - Ioi
> >>>
> >>>
> >>> <https://bugs.openjdk.java.net/browse/JDK-8245925>
> >>
> >
>