RFR(S) 8245925 G1 allocates EDEN region after CDS has executed GC

Thu Jun 4 23:54:26 UTC 2020

On 6/4/20 12:04 PM, Jiangli Zhou wrote:
> On Wed, Jun 3, 2020 at 10:56 PM Ioi Lam <ioi.lam at oracle.com> wrote:
>>
>>> On 5/29/20 9:40 PM, Yumin Qi wrote:
>>>> HI, Ioi
>>>>
>>>>    If the allocation of EDEN happens between GC and dump, should we
>>>> put the GC action in VM_PopulateDumpSharedSpace? This way, at
>>>> safepoint there should no allocation happens. The stack trace showed
>>>> it happened with a Java Thread, which should be blocked at safepoint.
>>>>
>>> Hi Yumin,
>>>
>>> I think GCs cannot be executed inside a safepoint, because some parts
>>> of GC need to execute in a safepoint, so they will be blocked until
>>> VM_PopulateDumpSharedSpace::doit has returned.
>>>
>>> Anyway, as I mentioned in my reply to Jiangli, there's a better way to
>>> fix this, so I will withdraw the current patch.
>>>
>> Hi Yumin,
>>
>> Actually, I changed my mind again, and implemented your suggestion :-)
>>
>> There's actually a way to invoke GC inside a safepoint (it's used by
>> "jcmd gc.heap_dump", for example). So I changed the CDS code to do the
>> same thing. It's a much simpler change and does what I want -- no other
>> thread will be able to make any heap allocation after the GC has
>> completed, so no EDEN region will be allocated:
>>
>> http://cr.openjdk.java.net/~iklam/jdk15/8245925-g1-eden-after-cds-gc.v02/
> Going with the simple approach for the short term sounds ok.
>
> 1612 void VM_PopulateDumpSharedSpace::doit() {
> ...
> 1615     if (GCLocker::is_active()) {
> 1616       // This should rarely happen during -Xshare:dump, if at
> all, but just to be safe.
> 1617       log_debug(cds)("GCLocker::is_active() ... try again");
> 1618       return;
> 1619     }
>
> MetaspaceShared::preload_and_dump()
> ...
> 1945     while (true) {
> 1946       {
> 1947         MutexLocker ml(THREAD, Heap_lock);
> 1948         VMThread::execute(&op);
> 1949       }
> 1950       // If dumping has finished, the VM would have exited. The
> only reason to
> 1951       // come back here is to wait for the GCLocker.
> 1952       assert(HeapShared::is_heap_object_archiving_allowed(), "sanity");
> 1953       os::naked_short_sleep(1);
> 1954     }
> 1955   }
>
>
> Instead of doing the while/retry, calling
> GCLocker::stall_until_clear() in MetaspaceShared::preload_and_dump
> before VM_PopulateDumpSharedSpace probably is much cleaner?

Hi Jiangli, I tried your suggestion, but GCLocker::stall_until_clear()  
cannot be called in the VM thread:

#  assert(thread->is_Java_thread()) failed: just checking

> Please also add a comment in MetaspaceShared::preload_and_dump to
> explain that Universe::heap()->collect_as_vm_thread expects that
> Heap_lock is already held by the thread, and that's the reason for the
> call to MutexLocker ml(THREAD, Heap_lock).

I changed the code to be like this:

     MutexLocker ml(THREAD, HeapShared::is_heap_object_archiving_allowed() ?
                    Heap_lock : NULL);     // needed for 
collect_as_vm_thread

> Do you also need to call
> Universe::heap()->soft_ref_policy()->set_should_clear_all_soft_refs(true)
> before GC?

There's no need:

void CollectedHeap::collect_as_vm_thread(GCCause::Cause cause) {
      ...
     case GCCause::_archive_time_gc:
     case GCCause::_metadata_GC_clear_soft_refs: {
       HandleMark hm;
       do_full_collection(true);         // do clear all soft refs

Thanks
- Ioi

> Best,
> Jiangli
>
>> The check for "if (GCLocker::is_active())" should almost always be
>> false. I left it there just for safety:
>>
>> During -Xshare:dump, we execute Java code to build the module graph,
>> load classes, etc. So theoretically someone could try to parallelize
>> some of that Java code in the future. Theoretically when CDS has entered
>> the safepoint, another thread could be in the middle a JNI method that
>> has held the GCLock.
>>
>> Thanks
>> - Ioi
>>
>>>> On 5/29/20 7:29 PM, Ioi Lam wrote:
>>>>> https://bugs.openjdk.java.net/browse/JDK-8245925
>>>>> http://cr.openjdk.java.net/~iklam/jdk15/8245925-g1-eden-after-cds-gc.v01/
>>>>>
>>>>>
>>>>>
>>>>> Summary:
>>>>>
>>>>> CDS supports archived heap objects only for G1. During -Xshare:dump,
>>>>> CDS executes a full GC so that G1 will compact the heap regions,
>>>>> leaving
>>>>> maximum contiguous free space at the top of the heap. Then, the
>>>>> archived
>>>>> heap regions are allocated from the top of the heap.
>>>>>
>>>>> Under some circumstances, java.lang.ref.Cleaners will execute
>>>>> after the GC has completed. The cleaners may allocate or
>>>>> synchronized, which
>>>>> will cause G1 to allocate an EDEN region at the top of the heap.
>>>>>
>>>>> The fix is simple -- after CDS has entered a safepoint, if EDEN
>>>>> regions exist,
>>>>> exit the safepoint, run GC, and try again. Eventually all the
>>>>> cleaners will
>>>>> be executed and no more allocation can happen.
>>>>>
>>>>> For safety, I limit the retry count to 30 (or about total 9 seconds).
>>>>>
>>>>> Thanks
>>>>> - Ioi
>>>>>
>>>>>
>>>>> <https://bugs.openjdk.java.net/browse/JDK-8245925>