RFR(S) 8245925 G1 allocates EDEN region after CDS has executed GC

Mon Jun 8 18:00:32 UTC 2020

Hi Ioi,

On Fri, Jun 5, 2020 at 10:51 PM Ioi Lam <ioi.lam at oracle.com> wrote:
>
>
>
> On 6/5/20 2:38 PM, Jiangli Zhou wrote:
> > Hi Ioi,
> >
> > Thanks for the updated webrev.
> >
> > To avoid the assert, could you call GCLocker::stall_until_clear in
> > MetaspaceShared::preload_and_dump() before grabbing the 'Heap_lock'
> > and VMThread::execute(&op), like following. It is the best to check
> > with Thomas and other GC experts about the interaction between
> > 'Heap_lock' and 'GCLocker'.
> >
> >          if (HeapShared::is_heap_object_archiving_allowed()) {
> >            GCLocker::stall_until_clear();
> >          }
> Hi Jiangli,
>
> Thanks for the review.
>
> I don't think this is sufficient. Immediately after
> GCLocker::stall_until_clear() has returned, another thread can invoke a
> JNI function that will activate the GCLocker. That's stated by the
> gcLocker.hpp comments:
>
> http://hg.openjdk.java.net/jdk/jdk/file/882b61be2c19/src/hotspot/share/gc/shared/gcLocker.hpp#l109
>
> I've pinged the GC team for this to get their opinion on this.
>

Thanks for pinging. Thomas pointed out in his email that
GCLocker::stall_until_clear() would only wait if a gc had been
requested while the JNI critical sections were active. Otherwise
GCLocker::stall_until_clear() would do nothing and let the caller
continue. My suggestion likely does not achieve what you intend to do.

Could you please describe the specific issue that you want to address
with GCLocker?

> > There is also a compiler error caused by the extra 'THREAD' arg for
> > MutexLocker. Please fix:
>
> I think you are using an older repo. The MutexLocker constructor has
> been changed since Jan 2020:
>
> http://hg.openjdk.java.net/jdk/jdk/annotate/882b61be2c19/src/hotspot/share/runtime/mutexLocker.hpp#l210
>

Ok, thanks. My repo was not updated properly due to unresolved tooling
issues on my side.

Best,
Jiangli

> Thanks
> - Ioi
>
> > mutexLocker.hpp:205:22: note:   no known conversion for argument 1
> > from ‘Thread*’ to ‘Mutex*’
> >
> >    205 |   MutexLocker(Mutex* mutex, Thread* thread,
> > Mutex::SafepointCheckFlag flag = Mutex::_safepoint_check_flag) :
> >
> >        |               ~~~~~~~^~~~~
> >
> > mutexLocker.hpp:191:3: note: candidate:
> > ‘MutexLocker::MutexLocker(Mutex*, Mutex::SafepointCheckFlag)’
> >
> >    191 |   MutexLocker(Mutex* mutex, Mutex::SafepointCheckFlag flag =
> > Mutex::_safepoint_check_flag) :
> >
> >     ... (rest of output omitted)
> >
> > Best,
> > Jiangli
> >
> > On Thu, Jun 4, 2020 at 9:37 PM Ioi Lam <ioi.lam at oracle.com> wrote:
> >> Hi Jiangli,
> >>
> >> Updated webrev is here:
> >>
> >> http://cr.openjdk.java.net/~iklam/jdk15/8245925-g1-eden-after-cds-gc.v03/
> >>
> >> The only difference with the previous version is this part:
> >>
> >> 1975 MutexLocker ml(THREAD, HeapShared::is_heap_object_archiving_allowed() ?
> >> 1976                Heap_lock : NULL);     // needed for
> >> collect_as_vm_thread
> >>
> >> Thanks
> >> - Ioi
> >>
> >>
> >> On 6/4/20 5:23 PM, Jiangli Zhou wrote:
> >>> Ioi, do you have a new webrev?
> >>>
> >>> Best,
> >>> Jiangli
> >>>
> >>> On Thu, Jun 4, 2020 at 4:54 PM Ioi Lam <ioi.lam at oracle.com> wrote:
> >>>>
> >>>> On 6/4/20 12:04 PM, Jiangli Zhou wrote:
> >>>>> On Wed, Jun 3, 2020 at 10:56 PM Ioi Lam <ioi.lam at oracle.com> wrote:
> >>>>>>> On 5/29/20 9:40 PM, Yumin Qi wrote:
> >>>>>>>> HI, Ioi
> >>>>>>>>
> >>>>>>>>      If the allocation of EDEN happens between GC and dump, should we
> >>>>>>>> put the GC action in VM_PopulateDumpSharedSpace? This way, at
> >>>>>>>> safepoint there should no allocation happens. The stack trace showed
> >>>>>>>> it happened with a Java Thread, which should be blocked at safepoint.
> >>>>>>>>
> >>>>>>> Hi Yumin,
> >>>>>>>
> >>>>>>> I think GCs cannot be executed inside a safepoint, because some parts
> >>>>>>> of GC need to execute in a safepoint, so they will be blocked until
> >>>>>>> VM_PopulateDumpSharedSpace::doit has returned.
> >>>>>>>
> >>>>>>> Anyway, as I mentioned in my reply to Jiangli, there's a better way to
> >>>>>>> fix this, so I will withdraw the current patch.
> >>>>>>>
> >>>>>> Hi Yumin,
> >>>>>>
> >>>>>> Actually, I changed my mind again, and implemented your suggestion :-)
> >>>>>>
> >>>>>> There's actually a way to invoke GC inside a safepoint (it's used by
> >>>>>> "jcmd gc.heap_dump", for example). So I changed the CDS code to do the
> >>>>>> same thing. It's a much simpler change and does what I want -- no other
> >>>>>> thread will be able to make any heap allocation after the GC has
> >>>>>> completed, so no EDEN region will be allocated:
> >>>>>>
> >>>>>> http://cr.openjdk.java.net/~iklam/jdk15/8245925-g1-eden-after-cds-gc.v02/
> >>>>> Going with the simple approach for the short term sounds ok.
> >>>>>
> >>>>> 1612 void VM_PopulateDumpSharedSpace::doit() {
> >>>>> ...
> >>>>> 1615     if (GCLocker::is_active()) {
> >>>>> 1616       // This should rarely happen during -Xshare:dump, if at
> >>>>> all, but just to be safe.
> >>>>> 1617       log_debug(cds)("GCLocker::is_active() ... try again");
> >>>>> 1618       return;
> >>>>> 1619     }
> >>>>>
> >>>>> MetaspaceShared::preload_and_dump()
> >>>>> ...
> >>>>> 1945     while (true) {
> >>>>> 1946       {
> >>>>> 1947         MutexLocker ml(THREAD, Heap_lock);
> >>>>> 1948         VMThread::execute(&op);
> >>>>> 1949       }
> >>>>> 1950       // If dumping has finished, the VM would have exited. The
> >>>>> only reason to
> >>>>> 1951       // come back here is to wait for the GCLocker.
> >>>>> 1952       assert(HeapShared::is_heap_object_archiving_allowed(), "sanity");
> >>>>> 1953       os::naked_short_sleep(1);
> >>>>> 1954     }
> >>>>> 1955   }
> >>>>>
> >>>>>
> >>>>> Instead of doing the while/retry, calling
> >>>>> GCLocker::stall_until_clear() in MetaspaceShared::preload_and_dump
> >>>>> before VM_PopulateDumpSharedSpace probably is much cleaner?
> >>>> Hi Jiangli, I tried your suggestion, but GCLocker::stall_until_clear()
> >>>> cannot be called in the VM thread:
> >>>>
> >>>> #  assert(thread->is_Java_thread()) failed: just checking
> >>>>
> >>>>> Please also add a comment in MetaspaceShared::preload_and_dump to
> >>>>> explain that Universe::heap()->collect_as_vm_thread expects that
> >>>>> Heap_lock is already held by the thread, and that's the reason for the
> >>>>> call to MutexLocker ml(THREAD, Heap_lock).
> >>>> I changed the code to be like this:
> >>>>
> >>>>        MutexLocker ml(THREAD, HeapShared::is_heap_object_archiving_allowed() ?
> >>>>                       Heap_lock : NULL);     // needed for
> >>>> collect_as_vm_thread
> >>>>
> >>>>> Do you also need to call
> >>>>> Universe::heap()->soft_ref_policy()->set_should_clear_all_soft_refs(true)
> >>>>> before GC?
> >>>> There's no need:
> >>>>
> >>>> void CollectedHeap::collect_as_vm_thread(GCCause::Cause cause) {
> >>>>         ...
> >>>>        case GCCause::_archive_time_gc:
> >>>>        case GCCause::_metadata_GC_clear_soft_refs: {
> >>>>          HandleMark hm;
> >>>>          do_full_collection(true);         // do clear all soft refs
> >>>>
> >>>> Thanks
> >>>> - Ioi
> >>>>
> >>>>> Best,
> >>>>> Jiangli
> >>>>>
> >>>>>> The check for "if (GCLocker::is_active())" should almost always be
> >>>>>> false. I left it there just for safety:
> >>>>>>
> >>>>>> During -Xshare:dump, we execute Java code to build the module graph,
> >>>>>> load classes, etc. So theoretically someone could try to parallelize
> >>>>>> some of that Java code in the future. Theoretically when CDS has entered
> >>>>>> the safepoint, another thread could be in the middle a JNI method that
> >>>>>> has held the GCLock.
> >>>>>>
> >>>>>> Thanks
> >>>>>> - Ioi
> >>>>>>
> >>>>>>>> On 5/29/20 7:29 PM, Ioi Lam wrote:
> >>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8245925
> >>>>>>>>> http://cr.openjdk.java.net/~iklam/jdk15/8245925-g1-eden-after-cds-gc.v01/
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Summary:
> >>>>>>>>>
> >>>>>>>>> CDS supports archived heap objects only for G1. During -Xshare:dump,
> >>>>>>>>> CDS executes a full GC so that G1 will compact the heap regions,
> >>>>>>>>> leaving
> >>>>>>>>> maximum contiguous free space at the top of the heap. Then, the
> >>>>>>>>> archived
> >>>>>>>>> heap regions are allocated from the top of the heap.
> >>>>>>>>>
> >>>>>>>>> Under some circumstances, java.lang.ref.Cleaners will execute
> >>>>>>>>> after the GC has completed. The cleaners may allocate or
> >>>>>>>>> synchronized, which
> >>>>>>>>> will cause G1 to allocate an EDEN region at the top of the heap.
> >>>>>>>>>
> >>>>>>>>> The fix is simple -- after CDS has entered a safepoint, if EDEN
> >>>>>>>>> regions exist,
> >>>>>>>>> exit the safepoint, run GC, and try again. Eventually all the
> >>>>>>>>> cleaners will
> >>>>>>>>> be executed and no more allocation can happen.
> >>>>>>>>>
> >>>>>>>>> For safety, I limit the retry count to 30 (or about total 9 seconds).
> >>>>>>>>>
> >>>>>>>>> Thanks
> >>>>>>>>> - Ioi
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> <https://bugs.openjdk.java.net/browse/JDK-8245925>
>