RFR(S) 8245925 G1 allocates EDEN region after CDS has executed GC

Sat Jun 6 05:51:31 UTC 2020

On 6/5/20 2:38 PM, Jiangli Zhou wrote:
> Hi Ioi,
>
> Thanks for the updated webrev.
>
> To avoid the assert, could you call GCLocker::stall_until_clear in
> MetaspaceShared::preload_and_dump() before grabbing the 'Heap_lock'
> and VMThread::execute(&op), like following. It is the best to check
> with Thomas and other GC experts about the interaction between
> 'Heap_lock' and 'GCLocker'.
>
>          if (HeapShared::is_heap_object_archiving_allowed()) {
>            GCLocker::stall_until_clear();
>          }
Hi Jiangli,

Thanks for the review.

I don't think this is sufficient. Immediately after 
GCLocker::stall_until_clear() has returned, another thread can invoke a 
JNI function that will activate the GCLocker. That's stated by the 
gcLocker.hpp comments:

http://hg.openjdk.java.net/jdk/jdk/file/882b61be2c19/src/hotspot/share/gc/shared/gcLocker.hpp#l109

I've pinged the GC team for this to get their opinion on this.

> There is also a compiler error caused by the extra 'THREAD' arg for
> MutexLocker. Please fix:

I think you are using an older repo. The MutexLocker constructor has 
been changed since Jan 2020:

http://hg.openjdk.java.net/jdk/jdk/annotate/882b61be2c19/src/hotspot/share/runtime/mutexLocker.hpp#l210

Thanks
- Ioi

> mutexLocker.hpp:205:22: note:   no known conversion for argument 1
> from ‘Thread*’ to ‘Mutex*’
>
>    205 |   MutexLocker(Mutex* mutex, Thread* thread,
> Mutex::SafepointCheckFlag flag = Mutex::_safepoint_check_flag) :
>
>        |               ~~~~~~~^~~~~
>
> mutexLocker.hpp:191:3: note: candidate:
> ‘MutexLocker::MutexLocker(Mutex*, Mutex::SafepointCheckFlag)’
>
>    191 |   MutexLocker(Mutex* mutex, Mutex::SafepointCheckFlag flag =
> Mutex::_safepoint_check_flag) :
>
>     ... (rest of output omitted)
>
> Best,
> Jiangli
>
> On Thu, Jun 4, 2020 at 9:37 PM Ioi Lam <ioi.lam at oracle.com> wrote:
>> Hi Jiangli,
>>
>> Updated webrev is here:
>>
>> http://cr.openjdk.java.net/~iklam/jdk15/8245925-g1-eden-after-cds-gc.v03/
>>
>> The only difference with the previous version is this part:
>>
>> 1975 MutexLocker ml(THREAD, HeapShared::is_heap_object_archiving_allowed() ?
>> 1976                Heap_lock : NULL);     // needed for
>> collect_as_vm_thread
>>
>> Thanks
>> - Ioi
>>
>>
>> On 6/4/20 5:23 PM, Jiangli Zhou wrote:
>>> Ioi, do you have a new webrev?
>>>
>>> Best,
>>> Jiangli
>>>
>>> On Thu, Jun 4, 2020 at 4:54 PM Ioi Lam <ioi.lam at oracle.com> wrote:
>>>>
>>>> On 6/4/20 12:04 PM, Jiangli Zhou wrote:
>>>>> On Wed, Jun 3, 2020 at 10:56 PM Ioi Lam <ioi.lam at oracle.com> wrote:
>>>>>>> On 5/29/20 9:40 PM, Yumin Qi wrote:
>>>>>>>> HI, Ioi
>>>>>>>>
>>>>>>>>      If the allocation of EDEN happens between GC and dump, should we
>>>>>>>> put the GC action in VM_PopulateDumpSharedSpace? This way, at
>>>>>>>> safepoint there should no allocation happens. The stack trace showed
>>>>>>>> it happened with a Java Thread, which should be blocked at safepoint.
>>>>>>>>
>>>>>>> Hi Yumin,
>>>>>>>
>>>>>>> I think GCs cannot be executed inside a safepoint, because some parts
>>>>>>> of GC need to execute in a safepoint, so they will be blocked until
>>>>>>> VM_PopulateDumpSharedSpace::doit has returned.
>>>>>>>
>>>>>>> Anyway, as I mentioned in my reply to Jiangli, there's a better way to
>>>>>>> fix this, so I will withdraw the current patch.
>>>>>>>
>>>>>> Hi Yumin,
>>>>>>
>>>>>> Actually, I changed my mind again, and implemented your suggestion :-)
>>>>>>
>>>>>> There's actually a way to invoke GC inside a safepoint (it's used by
>>>>>> "jcmd gc.heap_dump", for example). So I changed the CDS code to do the
>>>>>> same thing. It's a much simpler change and does what I want -- no other
>>>>>> thread will be able to make any heap allocation after the GC has
>>>>>> completed, so no EDEN region will be allocated:
>>>>>>
>>>>>> http://cr.openjdk.java.net/~iklam/jdk15/8245925-g1-eden-after-cds-gc.v02/
>>>>> Going with the simple approach for the short term sounds ok.
>>>>>
>>>>> 1612 void VM_PopulateDumpSharedSpace::doit() {
>>>>> ...
>>>>> 1615     if (GCLocker::is_active()) {
>>>>> 1616       // This should rarely happen during -Xshare:dump, if at
>>>>> all, but just to be safe.
>>>>> 1617       log_debug(cds)("GCLocker::is_active() ... try again");
>>>>> 1618       return;
>>>>> 1619     }
>>>>>
>>>>> MetaspaceShared::preload_and_dump()
>>>>> ...
>>>>> 1945     while (true) {
>>>>> 1946       {
>>>>> 1947         MutexLocker ml(THREAD, Heap_lock);
>>>>> 1948         VMThread::execute(&op);
>>>>> 1949       }
>>>>> 1950       // If dumping has finished, the VM would have exited. The
>>>>> only reason to
>>>>> 1951       // come back here is to wait for the GCLocker.
>>>>> 1952       assert(HeapShared::is_heap_object_archiving_allowed(), "sanity");
>>>>> 1953       os::naked_short_sleep(1);
>>>>> 1954     }
>>>>> 1955   }
>>>>>
>>>>>
>>>>> Instead of doing the while/retry, calling
>>>>> GCLocker::stall_until_clear() in MetaspaceShared::preload_and_dump
>>>>> before VM_PopulateDumpSharedSpace probably is much cleaner?
>>>> Hi Jiangli, I tried your suggestion, but GCLocker::stall_until_clear()
>>>> cannot be called in the VM thread:
>>>>
>>>> #  assert(thread->is_Java_thread()) failed: just checking
>>>>
>>>>> Please also add a comment in MetaspaceShared::preload_and_dump to
>>>>> explain that Universe::heap()->collect_as_vm_thread expects that
>>>>> Heap_lock is already held by the thread, and that's the reason for the
>>>>> call to MutexLocker ml(THREAD, Heap_lock).
>>>> I changed the code to be like this:
>>>>
>>>>        MutexLocker ml(THREAD, HeapShared::is_heap_object_archiving_allowed() ?
>>>>                       Heap_lock : NULL);     // needed for
>>>> collect_as_vm_thread
>>>>
>>>>> Do you also need to call
>>>>> Universe::heap()->soft_ref_policy()->set_should_clear_all_soft_refs(true)
>>>>> before GC?
>>>> There's no need:
>>>>
>>>> void CollectedHeap::collect_as_vm_thread(GCCause::Cause cause) {
>>>>         ...
>>>>        case GCCause::_archive_time_gc:
>>>>        case GCCause::_metadata_GC_clear_soft_refs: {
>>>>          HandleMark hm;
>>>>          do_full_collection(true);         // do clear all soft refs
>>>>
>>>> Thanks
>>>> - Ioi
>>>>
>>>>> Best,
>>>>> Jiangli
>>>>>
>>>>>> The check for "if (GCLocker::is_active())" should almost always be
>>>>>> false. I left it there just for safety:
>>>>>>
>>>>>> During -Xshare:dump, we execute Java code to build the module graph,
>>>>>> load classes, etc. So theoretically someone could try to parallelize
>>>>>> some of that Java code in the future. Theoretically when CDS has entered
>>>>>> the safepoint, another thread could be in the middle a JNI method that
>>>>>> has held the GCLock.
>>>>>>
>>>>>> Thanks
>>>>>> - Ioi
>>>>>>
>>>>>>>> On 5/29/20 7:29 PM, Ioi Lam wrote:
>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8245925
>>>>>>>>> http://cr.openjdk.java.net/~iklam/jdk15/8245925-g1-eden-after-cds-gc.v01/
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Summary:
>>>>>>>>>
>>>>>>>>> CDS supports archived heap objects only for G1. During -Xshare:dump,
>>>>>>>>> CDS executes a full GC so that G1 will compact the heap regions,
>>>>>>>>> leaving
>>>>>>>>> maximum contiguous free space at the top of the heap. Then, the
>>>>>>>>> archived
>>>>>>>>> heap regions are allocated from the top of the heap.
>>>>>>>>>
>>>>>>>>> Under some circumstances, java.lang.ref.Cleaners will execute
>>>>>>>>> after the GC has completed. The cleaners may allocate or
>>>>>>>>> synchronized, which
>>>>>>>>> will cause G1 to allocate an EDEN region at the top of the heap.
>>>>>>>>>
>>>>>>>>> The fix is simple -- after CDS has entered a safepoint, if EDEN
>>>>>>>>> regions exist,
>>>>>>>>> exit the safepoint, run GC, and try again. Eventually all the
>>>>>>>>> cleaners will
>>>>>>>>> be executed and no more allocation can happen.
>>>>>>>>>
>>>>>>>>> For safety, I limit the retry count to 30 (or about total 9 seconds).
>>>>>>>>>
>>>>>>>>> Thanks
>>>>>>>>> - Ioi
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> <https://bugs.openjdk.java.net/browse/JDK-8245925>