RFR(S) 8245925 G1 allocates EDEN region after CDS has executed GC

Sat May 30 05:44:14 UTC 2020

On 5/29/20 8:40 PM, Jiangli Zhou wrote:
> On Fri, May 29, 2020 at 7:30 PM Ioi Lam <ioi.lam at oracle.com> wrote:
>> https://bugs.openjdk.java.net/browse/JDK-8245925
>> http://cr.openjdk.java.net/~iklam/jdk15/8245925-g1-eden-after-cds-gc.v01/
>>
>>
>> Summary:
>>
>> CDS supports archived heap objects only for G1. During -Xshare:dump,
>> CDS executes a full GC so that G1 will compact the heap regions, leaving
>> maximum contiguous free space at the top of the heap. Then, the archived
>> heap regions are allocated from the top of the heap.
>>
>> Under some circumstances, java.lang.ref.Cleaners will execute
>> after the GC has completed. The cleaners may allocate or synchronized, which
>> will cause G1 to allocate an EDEN region at the top of the heap.
> This is an interesting one. Please give more details on under what
> circumstances java.lang.ref.Cleaners causes the issue. It's unclear to
> me why it hasn't been showing up before.

Hi Jiangli,

Thanks for the review. It's very helpful.

The assert (see my comment in JDK-8245925) happened in my prototype for 
JDK-8244778

http://cr.openjdk.java.net/~iklam/jdk15/8244778-archive-full-module-graph.v00.8/

I have to archive AppClassLoader and PlatformClassLoader, but need to 
clear their URLClassPath field (the "ucp" field). See 
clear_loader_states in metaspaceShared.cpp. Because of this, some 
java.util.zip.ZipFiles referenced by the URLClassPath  become garbage, 
and their Cleaners are executed after full GC has finished.

I think the bug has always existed, but is just never triggered because 
we have not activated the Cleaners.

>
>>
>> The fix is simple -- after CDS has entered a safepoint, if EDEN regions
>> exist,
>> exit the safepoint, run GC, and try again. Eventually all the cleaners will
>> be executed and no more allocation can happen.
>>
>> For safety, I limit the retry count to 30 (or about total 9 seconds).
>>
> I think it's better to skip the top allocated region(s) in such cases
> and avoid retrying. Dump time performance is important, as we are
> moving the cost from runtime to CDS dump time. It's desirable to keep
> the dump time cost as low as possible, so using CDS delivers better
> net gain overall.
>
> Here are some comments for your current webrev itself.
>
> 1611 static bool has_unwanted_g1_eden_regions() {
> 1612 #if INCLUDE_G1GC
> 1613   return HeapShared::is_heap_object_archiving_allowed() && UseG1GC &&
> 1614          G1CollectedHeap::heap()->eden_regions_count() > 0;
> 1615 #else
> 1616   return false;
> 1617 #endif
> 1618 }
>
> You can remove 'UseG1GC' from line 1613, as
> is_heap_object_archiving_allowed() check already covers it:
>
> static bool is_heap_object_archiving_allowed() {
>    CDS_JAVA_HEAP_ONLY(return (UseG1GC && UseCompressedOops &&
> UseCompressedClassPointers);)
>    NOT_CDS_JAVA_HEAP(return false;)
> }
>
> Please include heap archiving code under #if INCLUDE_CDS_JAVA_HEAP.
> It's better to extract the GC handling code in
> VM_PopulateDumpSharedSpace::doit() into a separate API in
> heapShared.*.
>
> It's time to enhance heap archiving to use a separate buffer when
> copying the objects at dump time (discussed before), as a longer term
> fix. I'll file a RFE.

Thanks for reminding me. I think that is a better way to fix this 
problem. It should be fairly easy to do, as we can already relocate the 
heap regions using HeapShared::patch_archived_heap_embedded_pointers(). 
Let me try to implement it.

BTW, the GC speed is very fast, because the heap is not used very much 
during -Xshare:dump. -Xlog:gc shows:

[0.259s][info][gc ] GC(0) Pause Full (Full GC for -Xshare:dump) 
4M->1M(32M) 8.220ms

So we have allocated only 4MB of objects, and only 1MB of those are 
reachable.

Anyway, I think we can even avoid running the GC altogether. We can scan 
for contiguous free space from the top of the heap (below the EDEN 
region). If there's more contiguous free space than the current 
allocated heap regions, we know for sure that we can archive all the 
heap objects that we need without doing the GC. That can be done as an 
RFE after copying the objects. It won't save much though (8ms out of 
about 700ms of total -Xshare:dump time).

I'll withdraw this patch for now, and will try to implement the object 
copying.

Thanks
- Ioi

>
> Best,
> Jiangli
>
>> Thanks
>> - Ioi
>>
>>
>> <https://bugs.openjdk.java.net/browse/JDK-8245925>